Core Concepts¶

Crucible organizes scientific data around four types of objects. Understanding how they relate makes it easier to structure your data.

Users¶

Users can be people or service accounts with access to Crucible. Users can be members of access groups and projects. To create a user profile, an ORCID account is required. Crucible API keys are required for working with the Crucible API locally. API keys can be generated at https://crucible.lbl.gov/api/v2/user_apikey.

If your integration requires a service account with elevated privileges, please join Discord and submit a request for a service account.

Projects¶

A project is the top-level organizational unit. Projects commonly map to research projects or user facility proposals.

Projects serve two primary purposes:

Organization — resources belong to a project, making them easier to find and filter.
Access control — all members of a project can read the datasets and samples within it. Adding a user to a project grants them access to all associated data.

Every dataset and sample must belong to a project. The project_id is a short, human-readable identifier chosen at creation time (e.g., MFP12345).

Project management guide →

Datasets¶

A dataset is the core data object in Crucible. It can combine:

Files — the actual measurement data (optional).
Structured metadata — general attributes collected about each dataset regardless of data type, such as measurement, data_type, instrument_name, session_name, timestamp, and data_format. Other information such as creation_time, modification_time, and download path will be generated and recorded on the server side.
Scientific metadata — free-form key-value pairs for experiment-specific parameters, notes, and comments that are considered necessary for reproducibility and provenance. The structure is intentionally flexible to accommodate a variety of use cases and reduce input burden for experimentalists as well as allow adaptibility over time. However, it is recommended to standardize the structure of the scientific data for specific data types within a project or organization to promote higher quality data curation and enable downstream analytics.
Keywords — searchable tags associated with each dataset
Thumbnails - Small, low resolution images to represent the results or underlying data in the dataset.

Datasets can be linked to each other in parent-child relationships to represent processing pipelines (e.g., raw data → calibrated → analyzed) or collections of related data.

Working with datasets →

Files¶

Files are the raw data objects attached to a dataset. Zero or more files may be added; each is stored in cloud storage and triggers an ingestion process to parse metadata and generate thumbnails. Files cannot exist independently of a dataset record.

Working with files and ingestion →

Samples¶

A sample represents a physical or computational material that data was collected from. Samples are useful for tracking provenance: which datasets were measured from which material, and how that material was prepared.

Like datasets, samples support parent-child hierarchies. A bulk crystal, a cleaved wafer cut from it, and a thin film deposited on the wafer can all be modeled as a parent-child sample chain.

A dataset can be linked to one or more samples, and a sample can be linked to one or more datasets.

Working with samples →

Instruments¶

An instrument represents the physical equipment from which a dataset originated. Linking a dataset to an instrument (via instrument_name) provides a consistent way to filter and search data by the equipment that produced it.

Instruments are shared across projects and exist independently of any single project. However, each distinct physical instrument should have its own entry.

If an instrument already exists in Crucible, reference it by name in your datasets. If not, create it first with client.instruments.create() — the method returns the existing record if one with that name already exists.

Working with instruments →

IDs¶

Every object in Crucible has a system-assigned unique_id generated by the mfid package. The unique_id is the stable identifier you use to retrieve, update, link, and download dataset, sample, instrument, and associated file objects.