Publish v2

class gladier_tools.publish.Publishv2(alias: str = None, alias_class: ToolAlias = None)

Bases: GladierBaseTool

Publish tooling is an extension to the original publish gladier tool, and allows for similar style publication of files and folders without the globus-pilot requirement.

Publishv2 allows for specifying a file or folder on a Globus Collection, and “publishing” the data. Publication consists of first gathering metadata on the file or folder, cataloguing the metadata with Globus Search, and transferring the file or folder to a Globus Collection. Additional metadata may be provided for the ingest step, and several options exist for modifying what metadata is automatically gathered.

Dependencies:

  • None!

Optional Dependencies:

  • puremagic – Better mimetype detection

  • datacite – Validation of Datacite (dc) metadata

FuncX Functions:

Publishv2 uses one function called ‘publishv2_gather_metadata’. For using custom generated metadata from another function, it can be handy to generate the entire ‘publishv2_gather_metadata’ input block and pass it as flow input instead, which can be done via the following:

@generate_flow_definition(modifiers={
    'publishv2_gather_metadata': {'payload': '$.MyCustomPayload.details.results[0].output'},
})

This tool nests input under the ‘publishv2’ keyword. An example is below:

'publishv2': {
    'dataset': 'foo.txt',
    'destination': '/~/my-test-dir',
    'source_collection': 'my-source-globus-collection',
    'destination_collection': 'my-destination-globus-collection',
    'index': 'my-globus-search-index-uuid',
    'visible_to': ['public'],
    # Ingest and Transfer are disabled by default, allowing for 'dry-run' testing.
    # 'ingest_enabled': True,
    # 'transfer_enabled': True,
},
'funcx_endpoint_non_compute': '4b116d3c-1703-4f8f-9f6f-39921e5864df',
Parameters:
  • dataset – Path to file or directory, which will be catalogued in Globus Search and transferred to the remote destination

  • destination – Location on destination collection where data should be stored

  • source_collection – The source Globus Collection where data is stored

  • destination_collection – The destination Collection to transfer the dataset

  • index – The index to ingest this dataset in Globus Search

  • visible_to – (list[str] Default: [‘public’]) A list of URN user or group identities for controlling access.

  • entry_id – (str Default:’metadata’) The entry id to use in the Globus Search record

  • metadata – (dict) Extra metadata to include in this search record

  • metadata_file – (str) An optional JSON metadata file to use for metadata. Will overwrite any existing “dc” or “files” generated content. Will be overridden by any values provided in “metadata”. Raises a ValueError if it was unable to load a JSON file.

  • source_collection_basepath – Share path if this is a Guest Collection, so that the proper source path can be constructed for the transfer document

  • destination_url_hostname – Adds “https_url” to each file in the ‘files’ document using this provided hostname

  • checksum_algorithms – (tuple Default:(‘sha256’, ‘sha512’)) Checksums to use for file metadata

  • metadata_dc_validation_schema – (str) Schema used to validate datacite (dc) metadata. Possible values are (schema40, schema41, schema42, schema43). Recommended schema43. Requires datacite package installed on funcx endpoint.

  • enable_publish – (bool Default: True) Enable the ingest step on the flow. If false, ingest will be skipped.

  • enable_transfer – (bool Default: True) Enable Transfer on the flow. If False, data will not be transferred to the remote collection.

  • enable_meta_dc – (bool Default: True) Generate datacite metadata during the ‘gathering’ funcx function step. datacite metadata is stored under the ‘dc’ key, and can be valiated using metadata_dc_validation_schema=schema43. If additional fields are provided via the metadata parameter, it will override overlapping fields.

  • enable_meta_files – (bool Default: True) Generate metadata on all files contained within the dataset. Files conforms to BDBag Remote File Manifests, generating a list of entries for each file with keys: (‘url’, ‘sha256’, ‘sha512’, ‘filename’, ‘length’). Files may also contain extended keys (‘mime_type’, ‘https_url’)

  • funcx_endpoint_non_compute – A funcX endpoint uuid for gathering metadata.