Publish v2
- class gladier_tools.publish.Publishv2(alias: str = None, alias_class: ToolAlias = None)
Bases:
GladierBaseTool
Publish tooling is an extension to the original publish gladier tool, and allows for similar style publication of files and folders without the globus-pilot requirement.
Publishv2 allows for specifying a file or folder on a Globus Collection, and “publishing” the data. Publication consists of first gathering metadata on the file or folder, cataloguing the metadata with Globus Search, and transferring the file or folder to a Globus Collection. Additional metadata may be provided for the ingest step, and several options exist for modifying what metadata is automatically gathered.
Dependencies:
None!
Optional Dependencies:
puremagic – Better mimetype detection
datacite – Validation of Datacite (dc) metadata
FuncX Functions:
Publishv2 uses one function called ‘publishv2_gather_metadata’. For using custom generated metadata from another function, it can be handy to generate the entire ‘publishv2_gather_metadata’ input block and pass it as flow input instead, which can be done via the following:
@generate_flow_definition(modifiers={ 'publishv2_gather_metadata': {'payload': '$.MyCustomPayload.details.results[0].output'}, })
This tool nests input under the ‘publishv2’ keyword. An example is below:
'publishv2': { 'dataset': 'foo.txt', 'destination': '/~/my-test-dir', 'source_collection': 'my-source-globus-collection', 'destination_collection': 'my-destination-globus-collection', 'index': 'my-globus-search-index-uuid', 'visible_to': ['public'], # Ingest and Transfer are disabled by default, allowing for 'dry-run' testing. # 'ingest_enabled': True, # 'transfer_enabled': True, }, 'funcx_endpoint_non_compute': '4b116d3c-1703-4f8f-9f6f-39921e5864df',
- Parameters:
dataset – Path to file or directory, which will be catalogued in Globus Search and transferred to the remote destination
destination – Location on destination collection where data should be stored
source_collection – The source Globus Collection where data is stored
destination_collection – The destination Collection to transfer the
dataset
index – The index to ingest this dataset in Globus Search
visible_to – (list[str] Default: [‘public’]) A list of URN user or group identities for controlling access.
entry_id – (str Default:’metadata’) The entry id to use in the Globus Search record
metadata – (dict) Extra metadata to include in this search record
metadata_file – (str) An optional JSON metadata file to use for metadata. Will overwrite any existing “dc” or “files” generated content. Will be overridden by any values provided in “metadata”. Raises a ValueError if it was unable to load a JSON file.
source_collection_basepath – Share path if this is a Guest Collection, so that the proper source path can be constructed for the transfer document
destination_url_hostname – Adds “https_url” to each file in the ‘files’ document using this provided hostname
checksum_algorithms – (tuple Default:(‘sha256’, ‘sha512’)) Checksums to use for file metadata
metadata_dc_validation_schema – (str) Schema used to validate datacite (dc) metadata. Possible values are (schema40, schema41, schema42, schema43). Recommended schema43. Requires datacite package installed on funcx endpoint.
enable_publish – (bool Default: True) Enable the ingest step on the flow. If false, ingest will be skipped.
enable_transfer – (bool Default: True) Enable Transfer on the flow. If False, data will not be transferred to the remote collection.
enable_meta_dc – (bool Default: True) Generate datacite metadata during the ‘gathering’ funcx function step. datacite metadata is stored under the ‘dc’ key, and can be valiated using metadata_dc_validation_schema=schema43. If additional fields are provided via the
metadata
parameter, it will override overlapping fields.enable_meta_files – (bool Default: True) Generate metadata on all files contained within the dataset. Files conforms to BDBag Remote File Manifests, generating a list of entries for each file with keys: (‘url’, ‘sha256’, ‘sha512’, ‘filename’, ‘length’). Files may also contain extended keys (‘mime_type’, ‘https_url’)
funcx_endpoint_non_compute – A funcX endpoint uuid for gathering metadata.