Contaminant and biomarkers databases
Ecotox database changes name to contaminant database in NPI new data storage.
This repository contains the following databases:
Contact person(s): Env Data Section: siri.uldal@npolar.no, Ecotox biology: heli.routti@npolar.no
1. Contribution
1.1 Use standardized Excel schemas
Using standardized Excel schemas for data storage can save NPI time and money. Please ask if you need help or schema modification. To get the schemas from gitlab, you need to choose “LDAP” pane and username “firstname.lastname” and your usual NPI login password.
NB! Plankton data goes into the plankton database, contact biology: anette.wold@npolar.no.
Biology fieldwork: https://gitlab.npolar.no/npdc/other/npdc-docs-public-repo/templates/fieldworkbiology/-/tree/main?ref_type=heads
Geoscience fieldwork: https://gitlab.npolar.no/npdc/other/npdc-docs-public-repo/templates/fieldworkgeoscience Contaminant (prev ecotox) lab: https://gitlab.npolar.no/npdc/other/npdc-docs-public-repo/templates/contaminant Prev storage: See Teams, the NILU-NPI data pipeline (Nd) - under General - Files - Templates - Transfer template
Biomarker lab: https://gitlab.npolar.no/npdc/other/npdc-docs-public-repo/templates/biomarker Biomarker is the remains of lab work - currently with an old schema. The database has an uncertain future..
Stabile isotopes lab: https://gitlab.npolar.no/npdc/other/npdc-docs-public-repo/templates/stabiso Note: For the stabile isotope schema you will need to activate the macro that Microsoft deactivates by default. See https://support.microsoft.com/nb-no/topic/en-potensielt-farlig-makro-er-blokkert-0952faa0-37e7-4316-b61d-5b5ed6024216 (in Norwegian).
Fatty acids lab: https://gitlab.npolar.no/npdc/other/npdc-docs-public-repo/templates/fattyacid Under development.
1.2 Use standarized IDs
Good ID naming helps if samples are in the freezer, but also in cases where the same sample is used for several analyses/lab work.
For birds - use bird metal ring if applicable. If not, use the following ID: Date+species (2 letters latin family name followed by 2 letters latin species name) + three letters matrix name. F.ex. 230628-URLO-EGG-02 for Uria Lomvia egg sample. This ID suffers currently from the lack of matrix standardization. GBIF is currently working on standards, but until then, please contact technical resposible Environmental Data Section.
Polar bears have their own IDs taken from the polar bear database.
1.3 Put your data on a backed up disk
You will need access to disk area \\NPDATA\PROJECT\Ecotox\. Ask ruben.dens@npolar.no or mikhail.itkin@npolar.no.
\\NPDATA\PROJECT\Ecotox\RAW new quality checked research data field trip(s) and lab data.
\\NPDATA\PROJECT\Ecotox\RAW new lab data from old biology field samples.
All older samples must have enclosed database eventID to ensure that your new lab data will be linked to the correct samples.
Write what you have done in a README file. Mark all new results with a yellow marker.
\\NPDATA\PROJECT\Ecotox\RAW\Corrected_older_research\ old research data from other reseachers. Use directory naming as described below, alt use a title that is easy to understand like species-your-name, f.eks. “arctic_fox_old_data_Heli_Routti”.
Write what you have done in the README file. Mark all updated results with a yellow marker.
For older samples you must enclose the database eventID and for older lab results OccurrenceID otherwise it will not be possible to include your updates.
Remember to enclose all eventIDs for deleted duplicates, otherwise it will
not be possible to find the rows to delete. Only new entries should be without eventIDs and OccurrenceID.
\\NPDATA\PROJECT\Ecotox\PROCESSED\ECOTOX_2018\ Unsorted old excel sheets from previous study which should be in the database, but might not be..
\\NPDATA\PROJECT\Ecotox\PROCESSED\ECOTOX_2018\Data_ecotox_WORK\innleste filer_2018\ferdig. The original datafiles for most of the research transferred to the ecotox database.
\\NPDATA\PROJECT\Ecotox\PROCESSED\ECOTOX_2023\ Files read and processed 2023. RAW are the excel files received from the researchers.
\\NPDATA\PROJECT\Ecotox\WORKSPACE\Igor Eulars\ contains an ordered collection of the RAW excel files.
\\NPDATA\PROJECT\Ecotox\WORKSPACE\READ-TO-DATABASE\ contains the files ready to be read into the database.
1.4 Directory name and structure
Directory label should be XXXXX-project name, where XXXXX is the NPI project number. If this is not feasible, you may use your name as directory name.
Analyte names should be standardized according to the list found at https://data.npolar.no/dataset/b31ace10-2eae-4ec1-be1d-f15176c18c27
1.5 README file
If information is not in the data file(s), please include a README file with:
1.6 Collaboration with NILU
NILU and NPI share a common schema, link above for contaminant/ecotox. Check with igor.eulars@npolar.no.
There is also a contract attachment between NILU and NPI.
1.7 Other lab data
For people delivering to the marine/plankton database, NPI has a collaboration with IOPAN. At the moment, NPI do not have any collaboration with other labs on data format/storage.
2. Getting data back
The data is behind a login - this means that you will not be able to access the database through the browser, but use tools like Curl or Python/Rlang. Login is that same as for all data accessed from https://data.npolar.no.
Download the data from https://data.npolar.no/dataset/0f295994-9b4a-41fb-8b4c-91b568dd554c/file
Getting lab data from NILU
2.1 Download the latest version of the data
Download the latest version of the database from https://data.npolar.no/dataset/0f295994-9b4a-41fb-8b4c-91b568dd554c/file where source contains the raw (first version) files and the other starting with v2 or v something are the versions available. Always use the latest version, if applicable.
Remember to log in - the database is not publicly available.
2.2 Get data back using Rlang
Code snippet to get all data into R lang:
# Download two databases - fieldwork with info about fieldtrips and registered measurements,
# lab with ecotox lab results. Downloading all is too large to be merged with R -it has to be done afterwards
# in Excel.
# Fetch libraries, if you don't have them they can be imstalled with command "install.packages('jsonlite')" etc.
library(jsonlite)
library(writexl)
fieldwork_json = fromJSON("v2Fieldworkbiology.json")
lab_json = fromJSON("v2Ecotox.json")
# Traverse JSON hierarchy
fieldwork_df = fieldwork_json$results
lab_df = lab_json$results
# Ways of viewing and see columns by using names
View(fieldwork_df)
head(fieldwork_df, 5)
names(fieldwork_df)
# Flatten nested fields (e.g., dynamicProperties.*)
field_df_flat <- jsonlite::flatten(fieldwork_df, recursive = TRUE)
lab_df_flat <- jsonlite::flatten(lab_df, recursive = TRUE)
# Write each dataset to its own Excel file
write_xlsx(field_df_flat, "fieldwork.xlsx")
write_xlsx(lab_df_flat, "lab.xlsx")
2.3 How to locate and search data in the database
In order to find what you are looking for you need to know the parameter names.
The parameter names follows the Darwin Core standard, however the ecotox database also have parameters under dynamicProperties that are tailored to NPI’s needs. Some of the most used fields in biology-fielddata:
For the dataset biology-fielddata-ecotox, many of the same variables exist, but also:
To merge the the biology-fielddata and biology-fielddata-ecotox, use the
field:
Missing a variable? JSON schemas for fieldwork and ecotox holds all varibles with short descriptions (you need to log in with your NPI logon using LDAP):
https://gitlab.npolar.no/npdc/other/npdc-docs-public-repo/schemas/-/blob/main/v2-jsonschema/biology/fielddata/fielddata.v1.0.5.json
https://gitlab.npolar.no/npdc/other/npdc-docs-public-repo/schemas/-/blob/main/v2-jsonschema/biology/fielddata/ecotox.v1.0.5.json
Examples (use curl to log in, link is not available through the browser):
Ecotox fieldwork database, search for glaucous gulls and matrix plasma:
https://v2-api.npolar.no/biology/fielddata/_search?and=scientificName:Larus+hyperboreus&and=dynamicProperties.matrix:plasma&verbose=true&page=..&type=feed
https://v2-api.npolar.no/biology/fielddata/_all_/ecotox/_search?and=scientificName:Larus+hyperboreus&and=dynamicProperties.matrix:plasma&page=..
Get all data for project MOSJ (only the field database has this the parameter projectName)
https://v2-api.npolar.no/biology/fielddata/_search?and=dynamicProperties.projectName:MOSJ&verbose=true&page=..
&page=.. means all data, not just the first page.
&verbose=true means include all metadata as well.
&type=feed means download as nd-json.
2.4 Scripts
Contaminant scripts to transform raw excel files (requires special access): https://gitlab.npolar.no/eds/other/ecotox
2.5 Gitlab template updates
See gitlab template