Ecotox (contaminant), biomarkers databases

Ecotox database changes name to contaminant database in NPI new data storage.

This repository contains the following databases:

  1. A fieldwork database for biologists including species measurements and samples.

  2. A database with ecotoxicology lab results.

  3. A database with biomarkers lab results.

Contact person(s): Env Data Section: siri.uldal@npolar.no, Ecotox biology: igor.eulaers@npolar.no

1. Contribution

1.1 Use standardized Excel schemas

Using standardized Excel schemas for data storage can save NPI time and money. Please ask if you need help or schema modification. To get the schemas from gitlab, you need to choose “LDAP” pane and username “firstname.lastname” and your usual NPI login password.

NB! Plankton data goes into the plankton database, contact biology: anette.wold@npolar.no.

Biology fieldwork: https://gitlab.npolar.no/npdc/other/npdc-docs-public-repo/templates/fieldworkbiology/-/tree/main?ref_type=heads

Geoscience fieldwork: https://gitlab.npolar.no/npdc/other/npdc-docs-public-repo/templates/fieldworkgeoscience Contaminant (prev ecotox) lab: https://gitlab.npolar.no/npdc/other/npdc-docs-public-repo/templates/contaminant Prev storage: See Teams, the NILU-NPI data pipeline (Nd) - under General - Files - Templates - Transfer template

Biomarker lab: https://gitlab.npolar.no/npdc/other/npdc-docs-public-repo/templates/biomarker Biomarker is the remains of lab work - currently with an old schema. The database has an uncertain future..

Stabile isotopes lab: https://gitlab.npolar.no/npdc/other/npdc-docs-public-repo/templates/stabiso Note: For the stabile isotope schema you will need to activate the macro that Microsoft deactivates by default. See https://support.microsoft.com/nb-no/topic/en-potensielt-farlig-makro-er-blokkert-0952faa0-37e7-4316-b61d-5b5ed6024216 (in Norwegian).

Fatty acids lab: https://gitlab.npolar.no/npdc/other/npdc-docs-public-repo/templates/fattyacid Under development.

1.2 Use standarized IDs

Good ID naming helps if samples are in the freezer, but also in cases where the same sample is used for several analyses/lab work.

For birds - use bird metal ring if applicable. If not, use the following ID: Date+species (2 letters latin family name followed by 2 letters latin species name) + three letters matrix name. F.ex. 230628-URLO-EGG-02 for Uria Lomvia egg sample. This ID suffers currently from the lack of matrix standardization. GBIF is currently working on standards, but until then, please contact technical resposible Environmental Data Section.

Polar bears have their own IDs taken from the polar bear database.

1.3 Put your data on a backed up disk

You will need access to disk area \\NPDATA\PROJECT\Ecotox\. Ask ruben.dens@npolar.no or mikhail.itkin@npolar.no.

\\NPDATA\PROJECT\Ecotox\RAW new quality checked research data field trip(s) and lab data.

\\NPDATA\PROJECT\Ecotox\RAW new lab data from old biology field samples. All older samples must have enclosed database eventID to ensure that your new lab data will be linked to the correct samples. Write what you have done in a README file. Mark all new results with a yellow marker.

\\NPDATA\PROJECT\Ecotox\RAW\Corrected_older_research\ old research data from other reseachers. Use directory naming as described below, alt use a title that is easy to understand like species-your-name, f.eks. “arctic_fox_old_data_Heli_Routti”. Write what you have done in the README file. Mark all updated results with a yellow marker. For older samples you must enclose the database eventID and for older lab results OccurrenceID otherwise it will not be possible to include your updates. Remember to enclose all eventIDs for deleted duplicates, otherwise it will not be possible to find the rows to delete. Only new entries should be without eventIDs and OccurrenceID.

\\NPDATA\PROJECT\Ecotox\PROCESSED\ECOTOX_2018\ Unsorted old excel sheets from previous study which should be in the database, but might not be..

\\NPDATA\PROJECT\Ecotox\PROCESSED\ECOTOX_2018\Data_ecotox_WORK\innleste filer_2018\ferdig. The original datafiles for most of the research transferred to the ecotox database.

\\NPDATA\PROJECT\Ecotox\PROCESSED\ECOTOX_2023\ Files read and processed 2023. RAW are the excel files received from the researchers.

\\NPDATA\PROJECT\Ecotox\WORKSPACE\Igor Eulars\ contains an ordered collection of the RAW excel files.

\\NPDATA\PROJECT\Ecotox\WORKSPACE\READ-TO-DATABASE\ contains the files ready to be read into the database.

1.4 Directory name and structure

Directory label should be XXXXX-project name, where XXXXX is the NPI project number. If this is not feasible, you may use your name as directory name.

Analyte names should be standardized according to the list found at https://data.npolar.no/dataset/b31ace10-2eae-4ec1-be1d-f15176c18c27

1.5 README file

If information is not in the data file(s), please include a README file with:

- explain units (preferably use one of “ng/g lw”,”ng/g”,”Bq/kg”,”ng/ml”,”µg Sn/kg”)
- explain other abbreviations,
- related publications,
- name of lab(s) and lab contact person,
- lab report number and lab sample numbers,
- lab report date,
- collaboration partners (if data is not owned by NPI alone),
- who collected the data,
- who was in charge of the research,
- date when data was collected,
- species name,
- matrix,
- project name (if applicable),
- ID for Research in Svalbard (if applicable),
- location where data was collected with decimal latitude and longitude including uncertainty in meters (<10, <100, <1000, <1000),
- state whether your data is all new, based on older samples or quality control of older data.
- Add DOI to any papers where your data are described. If you do not have a DOI, enclose the paper itself. Do NOT link to the NPI publication database or Brage as these are or could become obsolete.

1.6 Collaboration with NILU

NILU and NPI share a common schema, link above for contaminant/ecotox. Check with igor.eulars@npolar.no.

There is also a contract attachment between NILU and NPI.

1.7 Other lab data

For people delivering to the marine/plankton database, NPI has a collaboration with IOPAN. At the moment, NPI do not have any collaboration with other labs on data format/storage.

2. Getting data back

The data is behind a login - this means that you will not be able to access the database through the browser, but use tools like Curl or Python/Rlang. Login is that same as for all data accessed from https://data.npolar.no.

  1. Get the data, using https (Curl and Rlang descriptions below)

  2. Getting lab data from NILU

2.1 Using Kibana to get data (obsolete)

NP has a version of the Kibana visualization software set up with the ecotox database, available only through NPI’s network:

NOTE! Server uses the http protocol, not the secure https. Some browsers, f.ex. Firefox may have problems showing pages with http requests without additional configuration. Edge or Chrome usually presents the data with a warning asking if you want to continue to load an unsecure connection. Say yes as it is an NP webpage.

This server can be used to seach the ecotox database for data. Go to the “hamburger” menu in the upper left corner and select Analytics - Discover.

Then it depends what you want to know, if your question is of type “give me all data managed by researcher x”. Or “give me all data from Bjørnøya between 2001-2004” or “give me all data on larus hyperboreus from Jan Mayen” then on the left side, choose biology-fielddata. To get data back you also need to set the time frame in the upper right corner to years back rather than the last 15 minutes. Choose Absolute and set the calendar dates.

Now you should see that the number of Available fields in the left side has changed from zero to ca 41. Choose the fields that applies to your query, f.ex. “locality”.

F.ex. if you want the locality to be “Bjørnøya”, use the search field in the upper right corner. Write “locality.keyword : “Bjørnøya”” and press the blue refresh button on the upper right side. Then if there are any entries these should now turn up. Similarly, you can choose other fields.

If your search criteria goes the other way, like “give me all data on PCBs” then choose the biology-fielddata-ecotox database on the upper left side. Set the dates and pick the fields as described above to see what the database contains.

2.2 Get data back by using curl/https

From a Windows PC, download and install Curl.

Fieldwork database example:

curl GET "https://v2-api.npolar.no/biology/fielddata/_search?and=scientificName:Larus+hyperboreus&and=dynamicProperties.matrix:plasma&verbose=true&page=.."  -u your.email.address@npolar.no  > fieldworkdata

Ecotox database example:

curl GET "https://v2-api.npolar.no/biology/fielddata/_all_/ecotox/_search?and=scientificName:Larus+hyperboreus&and=dynamicProperties.matrix:plasma&verbose=true&page=.."  -u your.email.address@npolar.no  > ecotoxdata

Remember to replace your.email.address@npolar.no with your own npolar.no email address.

Now you will get two json files downloaded. You can get these converted into Excel by following the description on Windows.

2.3 Get data back using Rlang

Code snippet to get all data into R lang:

# Download two databases - fieldwork with info about fieldtrips and registered measurements,
# lab with ecotox lab results. Downloading all is too large to be merged with R -it has to be done afterwards
# in Excel.

# Fetch libraries, if you don't have them they can be imstalled with command "install.packages('jsonlite')" etc.
library(jsonlite)

fieldwork_json = fromJSON("https://v2-api.npolar.no/biology/fielddata/?page=..&includeData=true")
lab_json = fromJSON("https://v2-api.npolar.no/biology/fielddata/_all_/ecotox/?page=..&includeData=true")

# Traverse JSON hierarchy
fieldwork_df = fieldwork_json$items$data
lab_df = lab_json$items$data

# Ways of viewing and see columns by using names
View(fieldwork_df)
head(fieldwork_df, 5)
names(fieldwork_df)
field_df_flat = flatten(fieldwork_df, recursive = TRUE)
lab_df_flat = flatten(lab_df, recursive = TRUE)

2.4 How to locate and search data in the database

In order to find what you are looking for you need to know the parameter names.

The parameter names follows the Darwin Core standard, however the ecotox database also have parameters under dynamicProperties that are tailored to NPI’s needs. Some of the most used fields in biology-fielddata:

scientificName: Species’ name in latin.
dynamicProperties.matrix: Matrix from species.
locality: Placename.
datasetName: Dataset identificator, often connected to projectname or name of Excelsheet.
dynamicProperties.projectName: Name of project.
fieldNumber/identificationQualifier/dynamicProperties.indiviualIdentification: Usually ID connected with the sample or individual, found in the freezer f.ex.
dynamicProperties.responsible: Reseacher responsible for the data.
rightsHolder: The institution owning the data, usually NPI and institution partners.
eventDate: Usually date of measurement and/or sample collection.
recordedBy: The person responsible for the actual counting/collecting.
individualCount: The actual count itself.
decimalLatitude: Latitude, point or part of box.
decimalLongitude: Longitude, point or part of box.

For the dataset biology-fielddata-ecotox, many of the same variables exist, but also:

measurementDeterminedDate: Usually the date of the lab report.
measurementType: Type of toxic material, analyte.
measurementValue: The value measured in the matrix.
measurementUnit: The measured unit.
dynamicProperties.labReportId: Lab report identifier.

To merge the the biology-fielddata and biology-fielddata-ecotox, use the field:

eventID: The ID of the fieldwork event.

Missing a variable? JSON schemas for fieldwork and ecotox holds all varibles with short descriptions (you need to log in with your NPI logon using LDAP):

https://gitlab.npolar.no/npdc/other/npdc-docs-public-repo/schemas/-/blob/main/v2-jsonschema/biology/fielddata/fielddata.v1.0.5.json

https://gitlab.npolar.no/npdc/other/npdc-docs-public-repo/schemas/-/blob/main/v2-jsonschema/biology/fielddata/ecotox.v1.0.5.json

Examples (use curl to log in, link is not available through the browser):

Ecotox fieldwork database, search for glaucous gulls and matrix plasma:

https://v2-api.npolar.no/biology/fielddata/_search?and=scientificName:Larus+hyperboreus&and=dynamicProperties.matrix:plasma&verbose=true&page=..&type=feed

https://v2-api.npolar.no/biology/fielddata/_all_/ecotox/_search?and=scientificName:Larus+hyperboreus&and=dynamicProperties.matrix:plasma&page=..

Get all data for project MOSJ (only the field database has this the parameter projectName)

https://v2-api.npolar.no/biology/fielddata/_search?and=dynamicProperties.projectName:MOSJ&verbose=true&page=..

&page=.. means all data, not just the first page.

&verbose=true means include all metadata as well.

&type=feed means download as nd-json.

2.5 Project repository

Project repository with scripts to transform raw excel files (requires special access): https://gitlab.npolar.no/eds/other/ecotox