Contaminant and biomarkers databases

Ecotox database changes name to contaminant database in NPI new data storage.

This repository contains the following databases:

  1. A fieldwork database for biologists including species measurements and samples.

  2. A database with ecotoxicology lab results.

  3. A database with biomarkers lab results.

Contact person(s): Env Data Section: siri.uldal@npolar.no, Ecotox biology: heli.routti@npolar.no

1. Contribution

1.1 Use standardized Excel schemas

Using standardized Excel schemas for data storage can save NPI time and money. Please ask if you need help or schema modification. To get the schemas from gitlab, you need to choose “LDAP” pane and username “firstname.lastname” and your usual NPI login password.

NB! Plankton data goes into the plankton database, contact biology: anette.wold@npolar.no.

Biology fieldwork: https://gitlab.npolar.no/npdc/other/npdc-docs-public-repo/templates/fieldworkbiology/-/tree/main?ref_type=heads

Geoscience fieldwork: https://gitlab.npolar.no/npdc/other/npdc-docs-public-repo/templates/fieldworkgeoscience Contaminant (prev ecotox) lab: https://gitlab.npolar.no/npdc/other/npdc-docs-public-repo/templates/contaminant Prev storage: See Teams, the NILU-NPI data pipeline (Nd) - under General - Files - Templates - Transfer template

Biomarker lab: https://gitlab.npolar.no/npdc/other/npdc-docs-public-repo/templates/biomarker Biomarker is the remains of lab work - currently with an old schema. The database has an uncertain future..

Stabile isotopes lab: https://gitlab.npolar.no/npdc/other/npdc-docs-public-repo/templates/stabiso Note: For the stabile isotope schema you will need to activate the macro that Microsoft deactivates by default. See https://support.microsoft.com/nb-no/topic/en-potensielt-farlig-makro-er-blokkert-0952faa0-37e7-4316-b61d-5b5ed6024216 (in Norwegian).

Fatty acids lab: https://gitlab.npolar.no/npdc/other/npdc-docs-public-repo/templates/fattyacid Under development.

1.2 Use standarized IDs

Good ID naming helps if samples are in the freezer, but also in cases where the same sample is used for several analyses/lab work.

For birds - use bird metal ring if applicable. If not, use the following ID: Date+species (2 letters latin family name followed by 2 letters latin species name) + three letters matrix name. F.ex. 230628-URLO-EGG-02 for Uria Lomvia egg sample. This ID suffers currently from the lack of matrix standardization. GBIF is currently working on standards, but until then, please contact technical resposible Environmental Data Section.

Polar bears have their own IDs taken from the polar bear database.

1.3 Put your data on a backed up disk

You will need access to disk area \\NPDATA\PROJECT\Ecotox\. Ask ruben.dens@npolar.no or mikhail.itkin@npolar.no.

\\NPDATA\PROJECT\Ecotox\RAW new quality checked research data field trip(s) and lab data.

\\NPDATA\PROJECT\Ecotox\RAW new lab data from old biology field samples. All older samples must have enclosed database eventID to ensure that your new lab data will be linked to the correct samples. Write what you have done in a README file. Mark all new results with a yellow marker.

\\NPDATA\PROJECT\Ecotox\RAW\Corrected_older_research\ old research data from other reseachers. Use directory naming as described below, alt use a title that is easy to understand like species-your-name, f.eks. “arctic_fox_old_data_Heli_Routti”. Write what you have done in the README file. Mark all updated results with a yellow marker. For older samples you must enclose the database eventID and for older lab results OccurrenceID otherwise it will not be possible to include your updates. Remember to enclose all eventIDs for deleted duplicates, otherwise it will not be possible to find the rows to delete. Only new entries should be without eventIDs and OccurrenceID.

\\NPDATA\PROJECT\Ecotox\PROCESSED\ECOTOX_2018\ Unsorted old excel sheets from previous study which should be in the database, but might not be..

\\NPDATA\PROJECT\Ecotox\PROCESSED\ECOTOX_2018\Data_ecotox_WORK\innleste filer_2018\ferdig. The original datafiles for most of the research transferred to the ecotox database.

\\NPDATA\PROJECT\Ecotox\PROCESSED\ECOTOX_2023\ Files read and processed 2023. RAW are the excel files received from the researchers.

\\NPDATA\PROJECT\Ecotox\WORKSPACE\Igor Eulars\ contains an ordered collection of the RAW excel files.

\\NPDATA\PROJECT\Ecotox\WORKSPACE\READ-TO-DATABASE\ contains the files ready to be read into the database.

1.4 Directory name and structure

Directory label should be XXXXX-project name, where XXXXX is the NPI project number. If this is not feasible, you may use your name as directory name.

Analyte names should be standardized according to the list found at https://data.npolar.no/dataset/b31ace10-2eae-4ec1-be1d-f15176c18c27

1.5 README file

If information is not in the data file(s), please include a README file with:

- explain units (preferably use one of “ng/g lw”,”ng/g”,”Bq/kg”,”ng/ml”,”µg Sn/kg”)
- explain other abbreviations,
- related publications,
- name of lab(s) and lab contact person,
- lab report number and lab sample numbers,
- lab report date,
- collaboration partners (if data is not owned by NPI alone),
- who collected the data,
- who was in charge of the research,
- date when data was collected,
- species name,
- matrix,
- project name (if applicable),
- ID for Research in Svalbard (if applicable),
- location where data was collected with decimal latitude and longitude including uncertainty in meters (<10, <100, <1000, <1000),
- state whether your data is all new, based on older samples or quality control of older data.
- Add DOI to any papers where your data are described. If you do not have a DOI, enclose the paper itself. Do NOT link to the NPI publication database or Brage as these are or could become obsolete.

1.6 Collaboration with NILU

NILU and NPI share a common schema, link above for contaminant/ecotox. Check with igor.eulars@npolar.no.

There is also a contract attachment between NILU and NPI.

1.7 Other lab data

For people delivering to the marine/plankton database, NPI has a collaboration with IOPAN. At the moment, NPI do not have any collaboration with other labs on data format/storage.

2. Getting data back

The data is behind a login - this means that you will not be able to access the database through the browser, but use tools like Curl or Python/Rlang. Login is that same as for all data accessed from https://data.npolar.no.

  1. Download the data from https://data.npolar.no/dataset/0f295994-9b4a-41fb-8b4c-91b568dd554c/file

  2. Getting lab data from NILU

2.1 Download the latest version of the data

Download the latest version of the database from https://data.npolar.no/dataset/0f295994-9b4a-41fb-8b4c-91b568dd554c/file where source contains the raw (first version) files and the other starting with v2 or v something are the versions available. Always use the latest version, if applicable.

Remember to log in - the database is not publicly available.

2.2 Get data back using Rlang

Code snippet to get all data into R lang:

# Download two databases - fieldwork with info about fieldtrips and registered measurements,
# lab with ecotox lab results. Downloading all is too large to be merged with R -it has to be done afterwards
# in Excel.

# Fetch libraries, if you don't have them they can be imstalled with command "install.packages('jsonlite')" etc.
library(jsonlite)
library(writexl)

fieldwork_json = fromJSON("v2Fieldworkbiology.json")
lab_json = fromJSON("v2Ecotox.json")

# Traverse JSON hierarchy
fieldwork_df = fieldwork_json$results
lab_df = lab_json$results

# Ways of viewing and see columns by using names
View(fieldwork_df)
head(fieldwork_df, 5)
names(fieldwork_df)

# Flatten nested fields (e.g., dynamicProperties.*)
field_df_flat <- jsonlite::flatten(fieldwork_df, recursive = TRUE)
lab_df_flat   <- jsonlite::flatten(lab_df, recursive = TRUE)

# Write each dataset to its own Excel file
write_xlsx(field_df_flat, "fieldwork.xlsx")
write_xlsx(lab_df_flat, "lab.xlsx")

2.3 How to locate and search data in the database

In order to find what you are looking for you need to know the parameter names.

The parameter names follows the Darwin Core standard, however the ecotox database also have parameters under dynamicProperties that are tailored to NPI’s needs. Some of the most used fields in biology-fielddata:

scientificName: Species’ name in latin.
dynamicProperties.matrix: Matrix from species.
locality: Placename.
datasetName: Dataset identificator, often connected to projectname or name of Excelsheet.
dynamicProperties.projectName: Name of project.
fieldNumber/identificationQualifier/dynamicProperties.indiviualIdentification: Usually ID connected with the sample or individual, found in the freezer f.ex.
dynamicProperties.responsible: Reseacher responsible for the data.
rightsHolder: The institution owning the data, usually NPI and institution partners.
eventDate: Usually date of measurement and/or sample collection.
recordedBy: The person responsible for the actual counting/collecting.
individualCount: The actual count itself.
decimalLatitude: Latitude, point or part of box.
decimalLongitude: Longitude, point or part of box.

For the dataset biology-fielddata-ecotox, many of the same variables exist, but also:

measurementDeterminedDate: Usually the date of the lab report.
measurementType: Type of toxic material, analyte.
measurementValue: The value measured in the matrix.
measurementUnit: The measured unit.
dynamicProperties.labReportId: Lab report identifier.

To merge the the biology-fielddata and biology-fielddata-ecotox, use the field:

eventID: The ID of the fieldwork event.

Missing a variable? JSON schemas for fieldwork and ecotox holds all varibles with short descriptions (you need to log in with your NPI logon using LDAP):

https://gitlab.npolar.no/npdc/other/npdc-docs-public-repo/schemas/-/blob/main/v2-jsonschema/biology/fielddata/fielddata.v1.0.5.json

https://gitlab.npolar.no/npdc/other/npdc-docs-public-repo/schemas/-/blob/main/v2-jsonschema/biology/fielddata/ecotox.v1.0.5.json

Examples (use curl to log in, link is not available through the browser):

Ecotox fieldwork database, search for glaucous gulls and matrix plasma:

https://v2-api.npolar.no/biology/fielddata/_search?and=scientificName:Larus+hyperboreus&and=dynamicProperties.matrix:plasma&verbose=true&page=..&type=feed

https://v2-api.npolar.no/biology/fielddata/_all_/ecotox/_search?and=scientificName:Larus+hyperboreus&and=dynamicProperties.matrix:plasma&page=..

Get all data for project MOSJ (only the field database has this the parameter projectName)

https://v2-api.npolar.no/biology/fielddata/_search?and=dynamicProperties.projectName:MOSJ&verbose=true&page=..

&page=.. means all data, not just the first page.

&verbose=true means include all metadata as well.

&type=feed means download as nd-json.

2.4 Scripts

Contaminant scripts to transform raw excel files (requires special access): https://gitlab.npolar.no/eds/other/ecotox

2.5 Gitlab template updates

See gitlab template