PDC Metadata API

Table of contents

Support

Polar Data Catalogue (pdc@uwaterloo.ca)

Date Published: 03-31-2023

Date of Last Update: 03-31-2023


INTRODUCTION

Overview

The PDC Metadata Application Programming Interface (API) is an online resource that provides access to PDC metadata records. This service is implemented as a collection of URL endpoints that can be invoked using HTTP, in order to receive a response. In general, this service is analogous to a query system where API requests (i.e. queries) are made and corresponding responses (i.e. results) are returned. In particular, any given response is a representation of zero or more structured metadata records. This document will describe how to use this metadata API, in order to support activities such as ad hoc queries, automated metadata harvesting, programmatic data mining, and data visualization development.

Metadata Structure

Currently, this API provides both XML and JSON responses. The XML is compliant with ISO 19115 standards and the contents are equivalent to the XML that can be extracted from the Geospatial Search Tool. More details on the JSON format are provided below.

PDC JSON responses provide schema.org compliant JSON-LD. As a result, the contents will be structured and nested to represent a Dataset, where a Dataset element corresponds to a PDC metadata record. If a response contains one or more metadata records, then each individual Dataset element will be contained within a wrapping itemListElement, where the set of itemListElements are contained within an itemList (i.e. an array of itemListElements). Therefore, it is necessary to understand the structure of the metadata within the API response, in order to be able to effectively extract content of interest (see Consuming API Responses).

GETTING STARTED

OS Environment Requirements

  1. Any OS can run Curl: Curl is a command-line utility tool that can be used to send requests to an API. You Invoke the API using curl and the API returns a JSON response. JavaScript Object Notation (JSON) is a standard text-based format for representing structured data based on JavaScript object Syntax.
  2. A network connection

There are 2 ways to use curl on your machine:

  1. Use Curl from the command line/terminal/console
  2. Execute Curl commands directly from the browser

Option 1: Use curl from Command Line/Terminal

  1. To check if curl is installed, open terminal, console or command line interface
			
				
C:\users\anyuser> curl --version
curl 7.55.1 (Windows) libcurl/7.55.1 WinSSL  
Release-Date: 2017-11-14, security patched: 2019-11-05
Protocols: dict file ftp ftps http https imap imaps pop3 pop3s smtp smtps telnet tftp  
Features: AsynchDNS IPv6 Largefile SSPI Kerberos SPNEGO NTLM SSL
		
  1. If curl is already installed, skip to STEP-BY-STEP

Windows

  1. Download the curl executable zip file from https://curl.haxx.se/
  2. Go to the src folder inside the .zip file
  3. Find the curl executable file in curl-win64.zip/bin/curl
  4. Move the curl executable file to C:/Windows/System32/
  5. Open command line and confirm curl is installed
			
curl -version
			
		

Mac OS

  1. Check if brew is installed
			
				
brew -v
		
  1. If brew is not installed, use this command to install it
			
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" < /dev/null 2> /dev/null
			
		
  1. Install curl
			
				
brew install curl
		

Linux

  1. Install Curl
			
sudo apt-get install curl
		

Option 2: Execute curl commands directly from your browser.

There are multiple websites like Reqbin that allow you to use curl to send a request to an API. The website then displays the JSON response. Users don't need to have curl installed on their machine and don't need to use terminal/console/command line to use curl. This is a great option if the user just needs a few api requests.

STEP-BY-STEP

Making API Requests

This API provides access to publicly available PDC metadata records. There are a variety of methods for acquiring records that correspond to the different API endpoints.


Find records by page

Responses are returned in batches of up to 20 records and batch contents will change depending on the page number provided: Omitting a page number will return the first batch of records

https://polardata.ca/api/metadata?page={page}

Parameters Description
{page} A page number used to retrieve a corresponding batch of records

A JSON request to the API for page 5 would look like this:

			
				
curl -X GET "https://polardata.ca/api/metadata?page=5" -H "accept: */*"
			
		

An XML request to the API for page 5 would look like this:

			
				
curl -X GET "https://polardata.ca/api/metadata/xml?page=5" -H "accept: */*"
			
		

Responses

Code Description
200 OK - The requested page or id was found
400 Bad Request - The request was malformed or invalid
404 Not Found - The requested page or id was not found
429 Too Many Requests - This API is rate limited


Find a single record by id

https://polardata.ca/api/metadata/id/{id}

Parameters Description
{id} An assigned CCIN Reference Number. Must be an already existing CCIN Reference Number.

A JSON request to the API for a record with CCIN Reference Number 13264 would look like this:

			
				
curl -X GET "https://polardata.ca/api/metadata/id/13264" -H "accept: */*"
			
		

An XML request to the API for a record with CCIN Reference Number 13264 would look like this:

			
				
curl -X GET "https://polardata.ca/api/metadata/xml/id/13264" -H "accept: */*"
			
		

Responses

Code Description
200 OK - The requested page or id was found
400 Bad Request - The request was malformed or invalid
404 Not Found - The requested page or id was not found
429 Too Many Requests - This API is rate limited


Find records by Research Program

The search is not case sensitive. Responses are returned in batches of up to 20 records and batch contents will change depending on the page number provided: Omitting a page number will return the first batch of records

https://polardata.ca/api/metadata/program/{name}?page={page}

Parameters Description
{page} A page number used to retrieve a corresponding batch of records. Default value : 0
{name} Text that is equal to, or part of, an existing program name, using %20 as the space character

A JSON request to the API for records from 'Amundsen Science' with page '0' would look like this:

			
				
curl -X GET "https://polardata.ca/api/metadata/program/amundsen%20science?page=0" -H "accept: */*"
			
		

Am XML request to the API for records from 'Amundsen Science' with page '0' would look like this:

			
				
curl -X GET "https://polardata.ca/api/metadata/xml/program/amundsen%20science?page=0" -H "accept: */*"
			
		

Responses

Code Description
200 OK - The requested page or id was found
400 Bad Request - The request was malformed or invalid
404 Not Found - The requested page or id was not found
429 Too Many Requests - This API is rate limited


Find new or modified records by date

Must be a valid date represented as a string in YYYY-MM-DD format Responses are returned in batches of up to 20 records and batch contents will change depending on the page number provided: Omitting a page number will return the first batch of records

https://polardata.ca/api/metadata/since/{date}?page={page}

Parameters Description
{page} A page number used to retrieve a corresponding batch of records. Default value : 0
{date} A cutoff date from which to start the search, up to the present date. This endpoint is intended to keep collections updated by providing delta harvest capability. Therefore, the since date cannot be earlier than one year from the current date.

A JSON request to the API for records since 2022-12-31 would look like this:

			
				
curl -X GET "https://polardata.ca/api/metadata/since/2022-12-31?page=0" -H "accept: */*"
			
		

An XML request to the API for records since 2022-12-31 would look like this:

			
				
curl -X GET "https://polardata.ca/api/metadata/xml/since/2022-12-31?page=0" -H "accept: */*"
			
		

Responses

Code Description
200 OK - The requested page or id was found
400 Bad Request - The request was malformed or invalid
404 Not Found - The requested page or id was not found
429 Too Many Requests - This API is rate limited


Find records by title

The search is not case sensitive Responses are returned in batches of up to 20 records and batch contents will change depending on the page number provided: Omitting a page number will return the first batch of records

https://polardata.ca/api/metadata/title/{title}?page={page}

Parameters Description
page A page number used to retrieve a corresponding batch of records. Default value : 0
title Text that is equal to, or part of, an existing record title, using %20 as the space character

A JSON request to the API for records with title "ice cap" and page '0' would look like this:

			
				
curl -X GET "https://polardata.ca/api/metadata/title/ice%20cap?page=0" -H "accept: */*"
			
		

An XML request to the API for records with title "ice cap" and page '0' would look like this:

			
				
curl -X GET "https://polardata.ca/api/metadata/xml/title/ice%20cap?page=0" -H "accept: */*"
			
		

Responses

Code Description
200 OK - The requested page or id was found
400 Bad Request - The request was malformed or invalid
404 Not Found - The requested page or id was not found
429 Too Many Requests - This API is rate limited

ADVANCED

API Rate Limiting

This API allows a maximum of 500 requests per day. Excess requests will results in an HTTP 429 - Too Many Requests response status code, indicating that the user has sent too many requests in the given amount of time. As a reference, requesting the entire PDC metadata collection requires only about 150 API calls.

Consuming API Responses

This section shows how to programmatically invoke API responses and capture the contents for further programmatic manipulation. The following examples use the R programming language which can be used for many data related activities, such as inferential statistics, plot generation, and interactive data visualizations. Therefore, all code snippets will show how to convert API responses from JSON into an R data frame.

How to get all metadata

Because the metadata API only returns a fixed number of records per call, we will need to use conditional looping to sequentially harvest and aggregate the responses. The general strategy involves incrementing the requested page number until no further responses are received, as indicated by the response code. More advanced implementations are encouraged to use R's tryCatch feature which can be very helpful for detecting and skipping malformed URLs, as well as reporting other issues as console ouput (see below).

			
metadata <- data.frame()
apiURL <- "https://polardata.ca/api/metadata/?page="
page <- 0
status <- 200

while (status == 200) {
    apiCall <- paste0(apiURL, page)
    response <- GET(apiCall)
    status <- status_code(response)
    print(paste0(apiCall, ": ", status))
    if(status == 200) {
        json_text <- content(response, "text", encoding = "UTF-8")
        json_data <- fromJSON(json_text, flatten = TRUE)
        metadata <- bind_rows(metadata, json_data$"itemListElement")
        page <- page + 1
    }
}

View(metadata)
		

How to get metadata by topic

We can approximate a topic search by using the Search by Title endpoint. In this example, we will setup a variable called topic, so that only the value of topic needs to be changed for a new search. Also, nested looping could be used for a list of topics, instead of a single value, where the outer loop cycles through the list. Note the use of the tryCatch block in this implementation.

			
metadata <- data.frame()
apiURL <- "https://polardata.ca/api/metadata/title/"
topic <- "ice cap"
page <- 0
status <- 200

tryCatch({
  while (status == 200) {
    apiCall <-
      paste0(apiURL,
             str_replace_all(str_replace_all(topic, "/", ""), " ", "%20"),
             "?page=",
             page)
    response <- GET(apiCall)
    status <- status_code(response)
    print(paste0(apiCall, ": ", status))
    if (status == 200) {
      json_text <- content(response, "text", encoding = "UTF-8")
      json_data <- fromJSON(json_text, flatten = TRUE)
      metadata <- bind_rows(metadata, json_data$"itemListElement")
      page <- page + 1
    }
  }
}, error = function(e) {
  print(paste0("Error: ", e))
})

View(metadata)
		

TROUBLESHOOTING/COMMON ERRORS

Malformed API Requests

ASCII characters

Characters like spaces and slashes, as well as other ASCII characters, can cause issues with API requests when those characters are not properly encoded within the request. For this API, all spaces should be replaced with %20 (e.g. ice%20cap, instead of ice cap) and all slashes should be omitted (e.g. POLARCHARS, instead of POLAR/CHARS).

Invalid arguments

Ensure that arguments, like page numbers, make sense and remain within their expected scope, to avoid 404 responses. For example, requesting a negative page number violates the expected range of values. Similarly, requesting a page number greater than the maximum value will also cause a 404 response. However, in this case, since the maximum value is unknown by the caller, it might be necessary to attempt an out-of-bounds call, as shown in the example above. In such cases, response testing should immediately terminate incremental loops, in order to truncate invalid request attempts.

API version

If an API request is correctly structured but still fails, this is indicative of an issue with the endpoint itself. In most cases this is the result of the API service, or some needed dependency, being unavailable. However, it is also possible that the API has been updated to a newer version that has resulted in an endpoint being no longer available. While such revisions are generally avoided to prevent broken requests, eventually such revisions might become unavoidable. This can be confirmed by checking the API version information and the general documentation. Another potential issue is that API could have migrated to another server.

CONCLUSION

The PDC Metadata API provides a service for acquiring metadata records using a variety of endpoints. The API can be used manually, or it can be invoked in an automated fashion. The API is intended to support activities such as ad hoc queries, automated metadata harvesting, programmatic data mining, and data visualization development. Ultimately, our goal is to provide easy access to project resources for all interested parties, in the spirit of the FAIR Data Principles.

APPENDIX

  1. JSON for Linking Data
  2. curl
  3. schema.org
  4. Schema Markup Validator
  5. R Project
  6. FAIR Data Principles

Acronyms

Acronym Description
PDC Polar Data Catalogue
API Application Programming Interface
JSON JavaScript Object Notation
JSON-LD JavaScript Object Notation for Linked Data