Function to query the iDigBio API for specimen records
Usage
idig_search_records(
rq,
fields = FALSE,
max_items = 1e+05,
limit = 0,
offset = 0,
sort = FALSE,
...
)
Arguments
- rq
iDigBio record query in nested list format
- fields
vector of fields that will be contained in the data.frame, limited set returned by default, use "all" to get all indexed fields
- max_items
maximum number of results allowed to be retrieved (fail -safe)
- limit
maximum number of results returned
- offset
number of results to skip before returning results
- sort
vector of fields to use for sorting, UUID is always appended to make paging safe
- ...
additional parameters
Value
A data frame with fields requested or the following default fields:
UUID: Unique identifier assigned by iDigBio.
family - may be reassigned by iDigBio
genus - may be reassigned by iDigBio
scientificname - may be reassigned by iDigBio
country - may be modified by iDigBio
geopoint: Assigned by iDigBio.
datecollected: May be reassigned by iDigBio, see more here
collector: Assigned by iDigBio.
recordset: Assigned by iDigBio.
Details
Wraps idig_search
to provide defaults specific to searching
specimen records. Using this function instead of idig_search
directly is recommened.
Queries need to be specified as a nested list structure that will serialize to an iDigBio query object's JSON as expected by the iDigBio API: https://github.com/iDigBio/idigbio-search-api/wiki/Query-Format
As an example, the first sample query looks like this in JSON in the API documentation:
{
"scientificname": {
"type": "exists"
},
"family": "asteraceae"
}
To rewrite this in R for use as the rq parameter to
idig_search_records
or idig_search_media
, it would look like
this:
An example of a more complex JSON query with nested structures:
{
"geopoint": {
"type": "geo_bounding_box",
"top_left": {
"lat": 19.23,
"lon": -130
},
"bottom_right": {
"lat": -45.1119,
"lon": 179.99999
}
}
}
To rewrite this in R for use as the rq parameter, use nested calls to the list() function:
rq <- list(geopoint=list(
type="geo_bounding_box",
top_left=list(lat=19.23, lon=-130),
bottom_right=list(lat=-45.1119, lon= 179.99999)
)
)
See the Examples section below for more samples of simpler and more complex queries. Please refer to the API documentation for the full functionality availible in queries.
All matching results are returned up to the max_items cap (default 100,000).
If more results are wanted, a higher max_items can be passed as an option.
This API loads records 5,000 at a time using HTTP so performance with large
sets of data is not very good. Expect result sets over 50,000 records to
take tens of minutes. You can use the idig_count_records
or
idig_count_media
functions to find out how many records a
query will return; these are fast.
The iDigBio API will only return 5,000 records at a time but this function will automatically page through the results and return them all. Limit and offset are availible if manual paging of results is needed though the max_items cap still applies. The item count comes from the results header not the count of actual records in the limit/offset window.
Return is a data.frame containing the requested fields (or the default
fields). The columns in the data frame are untyped and no factors are pre-
built. Attribution and other metadata is attached to the dataframe in the
data.frame's attributes. (I.e. attributes(df)
)
Examples
if (FALSE) { # \dontrun{
# Simple example of retriving records in a genus:
idig_search_records(rq=list(genus="acer"), limit=10)
# This complex query shows that booleans passed to the API are represented
# as strings in R, fields used in the query don't have to be returned, and
# the syntax for accessing raw data fields:
idig_search_records(rq=list("hasImage"="true", genus="acer"),
fields=c("uuid", "data.dwc:verbatimLatitude"), limit=100)
# Searching inside a raw data field for a string, note that raw data fields
# are searched as full text, indexed fields are search with exact matches:
idig_search_records(rq=list("data.dwc:dynamicProperties"="parasite"),
fields=c("uuid", "data.dwc:dynamicProperties"), limit=100)
# Retriving a data.frame for use with MaxEnt. Notice geopoint is expanded
# to two columns in the data.frame: gepoint.lat and geopoint.lon:
df <- idig_search_records(rq=list(genus="acer", geopoint=list(type="exists")),
fields=c("uuid", "geopoint"), limit=10)
write.csv(df[c("uuid", "geopoint.lon", "geopoint.lat")],
file="acer_occurrences.csv", row.names=FALSE)
} # }