DatasetOperations¶
Access via client.datasets.
crucible.resources.datasets.DatasetOperations
¶
Dataset-related API operations.
Access via: client.datasets.get(), client.datasets.list(), etc. File operations (upload, download, thumbnails, ingestion) are inherited from FileOperations and also available via client.files.*.
get(dsid, include_metadata=False, include_links=False)
¶
Get dataset details, optionally including scientific metadata and links.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dsid
|
str
|
Dataset unique identifier |
required |
include_metadata
|
bool
|
Whether to include scientific metadata |
False
|
include_links
|
bool
|
Whether to include immediate parent/child/associated links |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
Dict |
Dict
|
Dataset object with optional metadata and links |
list(sample_id=None, include_metadata=False, include_links=False, limit=DEFAULT_LIMIT, offset=0, **kwargs)
¶
List datasets with optional filtering and automatic pagination.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sample_id
|
str
|
If provided, returns datasets for this sample |
None
|
limit
|
int
|
Maximum total results to return (default: 100). Requests above API_PAGE_MAX (1000) are handled transparently via parallel pagination. |
DEFAULT_LIMIT
|
offset
|
int
|
Starting position in the full result set (default: 0) |
0
|
include_metadata
|
bool
|
Include scientific metadata in results |
False
|
include_links
|
bool
|
Include linked resources (parents, children, associated) per dataset |
False
|
**kwargs
|
Any
|
Query parameters for filtering. Supported fields include: keyword, unique_id, public, dataset_name, file_to_upload, owner_orcid, project_id, instrument_name, source_folder, timestamp, size, data_format, data_type, measurement, session_name, sha256_hash_file_to_upload. Filters expect exact matches (case sensitive) except for keywords; keywords are case insensitive and match substrings. |
{}
|
Returns:
| Type | Description |
|---|---|
List[Dict]
|
List[Dict]: Dataset objects matching filter criteria |
create(dataset, scientific_metadata=None, keywords=None, files_to_upload=None, ingestor=None, verbose=False, wait_for_ingestion_response=False)
¶
Create a new dataset record with scientific metadata and keywords.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
Dataset
|
Dataset object with dataset details |
required |
scientific_metadata
|
dict
|
Scientific metadata |
None
|
keywords
|
list
|
Keywords to associate with dataset |
None
|
Returns: Dict: created_record, scientific_metadata_record, dsid
update(dsid, **updates)
¶
Update an existing dataset with new field values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dsid
|
str
|
Dataset unique identifier |
required |
**updates
|
Any
|
Fields to update (e.g., dataset_name="New Name", public=True) |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
Dict |
Dict
|
Updated dataset object |
Example
client.datasets.update("my-dataset-id", dataset_name="Updated Name", public=True)
delete(dsid)
¶
Delete a dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dsid
|
str
|
Dataset unique identifier |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Dict |
Dict
|
Deletion confirmation |
add_keyword(dsid, keyword)
¶
Add a keyword to a dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dsid
|
str
|
Dataset unique identifier |
required |
keyword
|
str
|
Keyword/tag to associate with dataset |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Dict |
Dict
|
Keyword object with updated usage count |
get_keywords(dsid=None, limit=DEFAULT_LIMIT)
¶
List keywords, optionally filtered by dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dsid
|
str
|
Dataset unique identifier to filter keywords |
None
|
limit
|
int
|
Maximum number of results to return |
DEFAULT_LIMIT
|
Returns:
| Type | Description |
|---|---|
List[Dict]
|
List[Dict]: Keyword objects with keyword text and num_datasets counts |
add_sample(dataset_id, sample_id)
¶
Link a sample to a dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_id
|
str
|
Dataset unique identifier |
required |
sample_id
|
str
|
Sample unique identifier |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Dict |
Dict
|
Information about the created link |
remove_sample(dataset_id, sample_id)
¶
Remove the link between a dataset and a sample.
Requires admin permissions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_id
|
str
|
Dataset unique identifier |
required |
sample_id
|
str
|
Sample unique identifier |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Dict |
Dict
|
Deletion confirmation |
link_parent_child(parent_dataset_id, child_dataset_id)
¶
Link a derived dataset to a parent dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
parent_dataset_id
|
str
|
The unique ID for the parent dataset |
required |
child_dataset_id
|
str
|
The unique ID for the derived dataset |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Dict |
Dict
|
Information about the created link |
remove_child(parent_dataset_id, child_dataset_id)
¶
Remove the parent-child link between two datasets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
parent_dataset_id
|
str
|
The unique ID of the parent dataset |
required |
child_dataset_id
|
str
|
The unique ID of the child dataset |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Dict |
Dict
|
Deletion confirmation |
list_parents(child_dataset_id, limit=DEFAULT_LIMIT, offset=0, **kwargs)
¶
List the parents of a given dataset with optional filtering.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
child_dataset_id
|
str
|
The unique ID of the dataset for which you want to find the parents |
required |
limit
|
int
|
Maximum number of results to return |
DEFAULT_LIMIT
|
offset
|
int
|
Starting position in the full result set (default: 0) |
0
|
**kwargs
|
Any
|
Query parameters for filtering datasets |
{}
|
Returns:
| Type | Description |
|---|---|
List[Dict]
|
List[Dict]: Parent datasets |
list_children(parent_dataset_id, limit=DEFAULT_LIMIT, offset=0, **kwargs)
¶
List the children of a given dataset with optional filtering.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
parent_dataset_id
|
str
|
The unique ID of the dataset for which you want to find the children |
required |
limit
|
int
|
Maximum number of results to return |
DEFAULT_LIMIT
|
offset
|
int
|
Starting position in the full result set (default: 0) |
0
|
**kwargs
|
Any
|
Query parameters for filtering datasets |
{}
|
Returns:
| Type | Description |
|---|---|
List[Dict]
|
List[Dict]: Children datasets |
graph(dataset_id, recursive=False, as_networkx=False)
¶
Return the graph of entities connected to this dataset.
Delegates to client.graphs.get(). See GraphOperations.get() for full docs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_id
|
str
|
Dataset unique identifier. |
required |
recursive
|
bool
|
If True, traverse the full connected component. |
False
|
as_networkx
|
bool
|
Return a networkx DiGraph if True. |
False
|
Returns:
| Type | Description |
|---|---|
|
dict | networkx.DiGraph: Node-link graph data. |