Skip to content

DatasetOperations

Access via client.datasets.

crucible.resources.datasets.DatasetOperations

Dataset-related API operations.

Access via: client.datasets.get(), client.datasets.list(), etc. File operations (upload, download, thumbnails, ingestion) are inherited from FileOperations and also available via client.files.*.

get(dsid, include_metadata=False, include_links=False)

Get dataset details, optionally including scientific metadata and links.

Parameters:

Name Type Description Default
dsid str

Dataset unique identifier

required
include_metadata bool

Whether to include scientific metadata

False
include_links bool

Whether to include immediate parent/child/associated links

False

Returns:

Name Type Description
Dict Dict

Dataset object with optional metadata and links

list(sample_id=None, include_metadata=False, include_links=False, limit=DEFAULT_LIMIT, offset=0, **kwargs)

List datasets with optional filtering and automatic pagination.

Parameters:

Name Type Description Default
sample_id str

If provided, returns datasets for this sample

None
limit int

Maximum total results to return (default: 100). Requests above API_PAGE_MAX (1000) are handled transparently via parallel pagination.

DEFAULT_LIMIT
offset int

Starting position in the full result set (default: 0)

0
include_metadata bool

Include scientific metadata in results

False
include_links bool

Include linked resources (parents, children, associated) per dataset

False
**kwargs Any

Query parameters for filtering. Supported fields include: keyword, unique_id, public, dataset_name, file_to_upload, owner_orcid, project_id, instrument_name, source_folder, timestamp, size, data_format, data_type, measurement, session_name, sha256_hash_file_to_upload. Filters expect exact matches (case sensitive) except for keywords; keywords are case insensitive and match substrings.

{}

Returns:

Type Description
List[Dict]

List[Dict]: Dataset objects matching filter criteria

create(dataset, scientific_metadata=None, keywords=None, files_to_upload=None, ingestor=None, verbose=False, wait_for_ingestion_response=False)

Create a new dataset record with scientific metadata and keywords.

Parameters:

Name Type Description Default
dataset Dataset

Dataset object with dataset details

required
scientific_metadata dict

Scientific metadata

None
keywords list

Keywords to associate with dataset

None

Returns: Dict: created_record, scientific_metadata_record, dsid

update(dsid, **updates)

Update an existing dataset with new field values.

Parameters:

Name Type Description Default
dsid str

Dataset unique identifier

required
**updates Any

Fields to update (e.g., dataset_name="New Name", public=True)

{}

Returns:

Name Type Description
Dict Dict

Updated dataset object

Example

client.datasets.update("my-dataset-id", dataset_name="Updated Name", public=True)

delete(dsid)

Delete a dataset.

Parameters:

Name Type Description Default
dsid str

Dataset unique identifier

required

Returns:

Name Type Description
Dict Dict

Deletion confirmation

add_keyword(dsid, keyword)

Add a keyword to a dataset.

Parameters:

Name Type Description Default
dsid str

Dataset unique identifier

required
keyword str

Keyword/tag to associate with dataset

required

Returns:

Name Type Description
Dict Dict

Keyword object with updated usage count

get_keywords(dsid=None, limit=DEFAULT_LIMIT)

List keywords, optionally filtered by dataset.

Parameters:

Name Type Description Default
dsid str

Dataset unique identifier to filter keywords

None
limit int

Maximum number of results to return

DEFAULT_LIMIT

Returns:

Type Description
List[Dict]

List[Dict]: Keyword objects with keyword text and num_datasets counts

add_sample(dataset_id, sample_id)

Link a sample to a dataset.

Parameters:

Name Type Description Default
dataset_id str

Dataset unique identifier

required
sample_id str

Sample unique identifier

required

Returns:

Name Type Description
Dict Dict

Information about the created link

remove_sample(dataset_id, sample_id)

Remove the link between a dataset and a sample.

Requires admin permissions.

Parameters:

Name Type Description Default
dataset_id str

Dataset unique identifier

required
sample_id str

Sample unique identifier

required

Returns:

Name Type Description
Dict Dict

Deletion confirmation

Link a derived dataset to a parent dataset.

Parameters:

Name Type Description Default
parent_dataset_id str

The unique ID for the parent dataset

required
child_dataset_id str

The unique ID for the derived dataset

required

Returns:

Name Type Description
Dict Dict

Information about the created link

remove_child(parent_dataset_id, child_dataset_id)

Remove the parent-child link between two datasets.

Parameters:

Name Type Description Default
parent_dataset_id str

The unique ID of the parent dataset

required
child_dataset_id str

The unique ID of the child dataset

required

Returns:

Name Type Description
Dict Dict

Deletion confirmation

list_parents(child_dataset_id, limit=DEFAULT_LIMIT, offset=0, **kwargs)

List the parents of a given dataset with optional filtering.

Parameters:

Name Type Description Default
child_dataset_id str

The unique ID of the dataset for which you want to find the parents

required
limit int

Maximum number of results to return

DEFAULT_LIMIT
offset int

Starting position in the full result set (default: 0)

0
**kwargs Any

Query parameters for filtering datasets

{}

Returns:

Type Description
List[Dict]

List[Dict]: Parent datasets

list_children(parent_dataset_id, limit=DEFAULT_LIMIT, offset=0, **kwargs)

List the children of a given dataset with optional filtering.

Parameters:

Name Type Description Default
parent_dataset_id str

The unique ID of the dataset for which you want to find the children

required
limit int

Maximum number of results to return

DEFAULT_LIMIT
offset int

Starting position in the full result set (default: 0)

0
**kwargs Any

Query parameters for filtering datasets

{}

Returns:

Type Description
List[Dict]

List[Dict]: Children datasets

graph(dataset_id, recursive=False, as_networkx=False)

Return the graph of entities connected to this dataset.

Delegates to client.graphs.get(). See GraphOperations.get() for full docs.

Parameters:

Name Type Description Default
dataset_id str

Dataset unique identifier.

required
recursive bool

If True, traverse the full connected component.

False
as_networkx bool

Return a networkx DiGraph if True.

False

Returns:

Type Description

dict | networkx.DiGraph: Node-link graph data.