Publish

Publish takes files or directories and transfers them to another publication endpoint and ingests metadata about the files into Globus Search. Both the index in Globus Search and the Globus Endpoint in Globus Pilot must be setup first.

Setup only needs to be done once, then publish can be used freely afterwards. Both Globus Pilot and Globus Search CLI can be installed with the following:

pip install globus-search-cli globus-pilot

Globus Search CLI is responsible for setting up indices in Globus Search. Globus Pilot is a tool which generates metadata about files and directories and handles both the transfer to an endpoint and the ingest to search.

Create the index with the following:

globus-search index create my-index

Setup your publication endpoint with the index you created above with:

globus-pilot index setup <UUID from the step above>

After that, you should be ready to publish to your data. See documentation for both of the tools above here:

class gladier_tools.publish.Publish(alias: str = None, alias_class: ToolAlias = None)

Bases: GladierBaseTool

This function uses the globus-pilot tool to generate metadata compatible with portals on https://acdc.alcf.anl.gov/. Requires globus_pilot>=0.6.0.

FuncX Functions:

  • publish_gather_metadata (funcx_endpoint_non_compute)

Publication happens in three steps:

  • PublishGatherMetadata – A funcx function which uses globus-pilot to gather metadata on files or folders

  • PublishTransfer – Transfers data to the Globus Endpoint selected in Globus Pilot

  • PublishIngest – Ingest metadata gathered in fist step to Globus Search

Note: This tool needs internet access to fetch Pilot configuration records, which contain the destination endpoint and other project info. The default FuncX endpoint name is funcx_endpoint_non_compute. You can change this with the following modifier:

@generate_flow_definition(modifiers={
    'publish_gather_metadata': {'endpoint': 'funcx_endpoint_non_compute'},
})

More details on modifiers can be found at https://gladier.readthedocs.io/en/latest/gladier/flow_generation.html

NOTE: This tool nests input under the ‘pilot’ keyword. Submit your input as the following:

{
    'input': {
        'pilot': {
            'dataset': 'foo',
            'index': 'my-search-index-uuid',
            'project': 'my-pilot-project',
            'source_globus_endpoint': 'ddb59aef-6d04-11e5-ba46-22000b92c6ec',
        }
}
Parameters:
  • dataset – Path to file or directory. Used by Pilot to gather metadata, and set as the source for transfer to the publication endpoint configured in Pilot.

  • destination – relative location under project directory to place dataset (Default /)

  • source_globus_endpoint – The Globus Endpoint of the machine where you are executing

  • source_collection_basepath – If using a guest collection, the posix path of the guest collection. Used to translate source paths for the transfer step. (Default: /)

  • index – The index to ingest this dataset in Globus Search

  • project – The Pilot project to use for this dataset

  • groups – A list of additional groups to make these records visible_to.

  • funcx_endpoint_non_compute – A funcX endpoint uuid for gathering metadata. Requires internet access.

Requires: the ‘globus-pilot’ package to be installed.