lamindb.Record

class lamindb.Record(name: str, type: Record | None = None, is_type: bool = False, description: str | None = None, schema: Schema | None = None, reference: str | None = None, reference_type: str | None = None)

Bases: SQLRecord, HasType, CanCurate, TracksRun, TracksUpdates, HasParents

Flexible metadata records for labeling and organizing entities.

Useful for managing samples, donors, cells, compounds, sequences, and other custom entities.

Parameters:
  • namestr A name.

  • descriptionstr A description.

  • typeRecord | None = None The type of this record.

  • is_typebool = False Whether this record is a type (a record that classifies other records).

  • schemaSchema | None = None A schema defining allowed features for records of this type. Only applicable when is_type=True.

  • referencestr | None = None For instance, an external ID or a URL.

  • reference_typestr | None = None For instance, "url".

See also

Feature()

Dimensions of measurement (e.g. column of a sheet, attribute of a record).

Examples

Create a record and annotate an Artifact:

sample1 = ln.Record(name="Sample 1").save()
artifact.records.add(sample1)

Group several records under a record type:

experiment_type = ln.Record(name="Experiment", is_type=True).save()
experiment1 = ln.Record(name="Experiment 1", type=experiment_type).save()
experiment2 = ln.Record(name="Experiment 2", type=experiment_type).save()

Export all records of that type to dataframe:

experiment_type.records.to_dataframe()
#>              name   ...
#>      Experiment 1   ...
#>      Experiment 2   ...

Add features to a record:

gc_content = ln.Feature(name="gc_content", dtype=float).save()
experiment = ln.Feature(name="experiment", dtype=experiment_type).save()
sample1.features.add_values({
    "gc_content": 0.5,
    "experiment": "Experiment 1",
})

Constrain features by using a Schema, creating a sheet:

schema = ln.Schema([gc_content, experiment], name="sample_schema").save()
sheet = ln.Record(name="Sample", is_type=True, schema=schema).save()  # add schema to type
sample2 = ln.Record(name="Sample 2", type=sheet).save()
sample2.features.add_values({"gc_content": 0.6})  # raises ValidationError because experiment is missing

Query records by features:

ln.Record.filter(gc_content=0.55)     # exact match
ln.Record.filter(gc_content__gt=0.5)  # greater than
ln.Record.filter(type=sheet)          # just the record on the sheet

Model custom ontologies through their parents/children attributes:

cell_type = ln.Record(name="CellType", is_type=True).save()
t_cell = ln.Record(name="T Cell", type=cell_type).save()
cd4_t_cell = ln.Record(name="CD4+ T Cell", type=cell_type).save()
t_cell.children.add(cd4_t_cell)

If you work with basic biological entities like cell lines, cell types, tissues, consider building on the public biological ontologies in bionty.

What is the difference between Record and SQLRecord?

The features of a Record are flexible: you can dynamically define features and add features to a record. The fields of a SQLRecord are fixed: you need to define them in code and then migrate the underlying database.

You can configure a SQLRecord by subclassing it in a custom schema, for example, as done here: github.com/laminlabs/wetlab

Attributes

property features: FeatureManager

Manage annotations with features.

property is_sheet: bool

Check if record is a sheet, i.e., self.is_type and self.schema is not None.

Simple fields

uid: str

A universal random id, valid across DB instances.

name: str

Name or title of record (optional).

Names for a given type and space are constrained to be unique.

is_type: bool

Indicates if record is a type.

For example, if a record “Compound” is a type, the actual compounds “darerinib”, “tramerinib”, would be instances of that type.

description: str | None

A description.

reference: str | None

A simple reference like a URL or external ID.

reference_type: str | None

Type of simple reference.

params: dict | None

For example, to hold additional data in a row in a sheet, not validated as features.

is_locked: bool

Whether the record is locked for edits.

created_at: datetime

Time of creation of record.

updated_at: datetime

Time of last update to record.

Relational fields

branch: Branch

Life cycle state of record.

branch.name can be “main” (default branch), “trash” (trash), branch.name = "archive" (archived), or any other user-created branch typically planned for merging onto main after review.

space: Space

The space in which the record lives.

created_by: User

Creator of record.

type: Record | None

Type of record, e.g., Sample, Donor, Cell, Compound, Sequence.

Allows to group records by type, e.g., all samples, all donors, all cells, all compounds, all sequences.

schema: Schema | None

A schema to enforce for a type.

This is analogous to the schema attribute of an Artifact. If is_type is True, the schema is used to enforce features for each record of this type.

run: Run | None

Run that created the record.

components: Record

Records linked in this record as a value.

parents: Record

Ontological parents of this record.

You can build an ontology under a given type. For example, introduce a type CellType and model the hiearchy of cell types under it via parents and children.

input_of_runs: Run

Runs that use this record as an input.

artifacts: Artifact

Artifacts annotated by this record.

runs: Run

Runs annotated by this record.

transforms: Transform

Transforms annotated by this record.

collections: Collection

Collections annotated by this record.

records: Record

If a type (is_type=True), records of this type.

composites: Record

Records linking this record as a value. Is reverse accessor for components.

children: Record

Ontological children of this record. Is reverse accessor for parents.

references: Reference

References that annotate this record.

projects: Project

Projects that annotate this record.

blocks: RunBlock

Blocks that annotate this record.

linked_users: User

Users linked in this record as values.

linked_runs: Run

Runs linked in this record as values.

linked_transforms

Accessor to the related objects manager on the forward and reverse sides of a many-to-many relation.

In the example:

class Pizza(Model):
    toppings = ManyToManyField(Topping, related_name='pizzas')

Pizza.toppings and Topping.pizzas are ManyToManyDescriptor instances.

Most of the implementation is delegated to a dynamically defined manager class built by create_forward_many_to_many_manager() defined below.

linked_ulabels: ULabel

ULabels linked in this record as values.

linked_artifacts: Artifact

Artifacts linked in this record as values.

linked_collections: Collection

Collections linked in this record as values.

linked_references: Reference

References linked in this record as values.

linked_projects: Project

Projects linked in this record as values.

values_json: RecordJson

JSON values (record_id, feature_id, value).

values_record: RecordRecord

Record values with their features (record_id, feature_id, value_id).

values_ulabel: RecordULabel

ULabel values with their features (record_id, feature_id, value_id).

values_user: RecordUser

User values with their features (record_id, feature_id, value_id).

values_run: RecordRun

Run values with their features (record_id, feature_id, value_id).

values_artifact: RecordArtifact

Artifact values with their features (record_id, feature_id, value_id).

values_collection

Accessor to the related objects manager on the reverse side of a many-to-one relation.

In the example:

class Child(Model):
    parent = ForeignKey(Parent, related_name='children')

Parent.children is a ReverseManyToOneDescriptor instance.

Most of the implementation is delegated to a dynamically defined manager class built by create_forward_many_to_many_manager() defined below.

values_transform

Accessor to the related objects manager on the reverse side of a many-to-one relation.

In the example:

class Child(Model):
    parent = ForeignKey(Parent, related_name='children')

Parent.children is a ReverseManyToOneDescriptor instance.

Most of the implementation is delegated to a dynamically defined manager class built by create_forward_many_to_many_manager() defined below.

values_reference: RecordReference

Reference values with their features (record_id, feature_id, value_id).

values_project: RecordProject

Project values with their features (record_id, feature_id, value_id).

Class methods

classmethod filter(*queries, **expressions)

Query records.

Parameters:
  • queries – One or multiple Q objects.

  • expressions – Fields and values passed as Django query expressions.

Return type:

QuerySet

See also

Examples

>>> ln.Project(name="my label").save()
>>> ln.Project.filter(name__startswith="my").to_dataframe()
classmethod get(idlike=None, **expressions)

Get a single record.

Parameters:
  • idlike (int | str | None, default: None) – Either a uid stub, uid or an integer id.

  • expressions – Fields and values passed as Django query expressions.

Raises:

lamindb.errors.DoesNotExist – In case no matching record is found.

Return type:

SQLRecord

See also

Examples

record = ln.Record.get("FvtpPJLJ")
record = ln.Record.get(name="my-label")
classmethod to_dataframe(include=None, features=False, limit=100)

Evaluate and convert to pd.DataFrame.

By default, maps simple fields and foreign keys onto DataFrame columns.

Guide: Query & search registries

Parameters:
  • include (str | list[str] | None, default: None) – Related data to include as columns. Takes strings of form "records__name", "cell_types__name", etc. or a list of such strings. For Artifact, Record, and Run, can also pass "features" to include features with data types pointing to entities in the core schema. If "privates", includes private fields (fields starting with _).

  • features (bool | list[str], default: False) – Configure the features to include. Can be a feature name or a list of such names. If "queryset", infers the features used within the current queryset. Only available for Artifact, Record, and Run.

  • limit (int, default: 100) – Maximum number of rows to display. If None, includes all results.

  • order_by – Field name to order the records by. Prefix with ‘-’ for descending order. Defaults to ‘-id’ to get the most recent records. This argument is ignored if the queryset is already ordered or if the specified field does not exist.

Return type:

DataFrame

Examples

Include the name of the creator:

ln.Record.to_dataframe(include="created_by__name"])

Include features:

ln.Artifact.to_dataframe(include="features")

Include selected features:

ln.Artifact.to_dataframe(features=["cell_type_by_expert", "cell_type_by_model"])
classmethod search(string, *, field=None, limit=20, case_sensitive=False)

Search.

Parameters:
  • string (str) – The input string to match against the field ontology values.

  • field (str | DeferredAttribute | None, default: None) – The field or fields to search. Search all string fields by default.

  • limit (int | None, default: 20) – Maximum amount of top results to return.

  • case_sensitive (bool, default: False) – Whether the match is case sensitive.

Return type:

QuerySet

Returns:

A sorted DataFrame of search results with a score in column score. If return_queryset is True. QuerySet.

See also

filter() lookup()

Examples

records = ln.Record.from_values(["Label1", "Label2", "Label3"], field="name").save()
ln.Record.search("Label2")
classmethod lookup(field=None, return_field=None)

Return an auto-complete object for a field.

Parameters:
  • field (str | DeferredAttribute | None, default: None) – The field to look up the values for. Defaults to first string field.

  • return_field (str | DeferredAttribute | None, default: None) – The field to return. If None, returns the whole record.

  • keep – When multiple records are found for a lookup, how to return the records. - "first": return the first record. - "last": return the last record. - False: return all records.

Return type:

NamedTuple

Returns:

A NamedTuple of lookup information of the field values with a dictionary converter.

See also

search()

Examples

Lookup via auto-complete on .:

import bionty as bt
bt.Gene.from_source(symbol="ADGB-DT").save()
lookup = bt.Gene.lookup()
lookup.adgb_dt

Look up via auto-complete in dictionary:

lookup_dict = lookup.dict()
lookup_dict['ADGB-DT']

Look up via a specific field:

lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id")
genes.ensg00000002745

Return a specific field value instead of the full record:

lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")
classmethod connect(instance)

Query a non-default LaminDB instance.

Parameters:

instance (str | None) – An instance identifier of form “account_handle/instance_name”.

Return type:

QuerySet

Examples

ln.Record.connect("account_handle/instance_name").search("label7", field="name")
classmethod inspect(values, field=None, *, mute=False, organism=None, source=None, from_source=True, strict_source=False)

Inspect if values are mappable to a field.

Being mappable means that an exact match exists.

Parameters:
  • values (list[str] | Series | array) – Values that will be checked against the field.

  • field (str | DeferredAttribute | None, default: None) – The field of values. Examples are 'ontology_id' to map against the source ID or 'name' to map against the ontologies field names.

  • mute (bool, default: False) – Whether to mute logging.

  • organism (str | SQLRecord | None, default: None) – An Organism name or record.

  • source (SQLRecord | None, default: None) – A bionty.Source record that specifies the version to inspect against.

  • strict_source (bool, default: False) – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

Return type:

bionty.base.dev.InspectResult

See also

validate()

Example:

import bionty as bt

# save some gene records
bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

# inspect gene symbols
gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
result = bt.Gene.inspect(gene_symbols, field=bt.Gene.symbol, organism="human")
assert result.validated == ["A1CF", "A1BG"]
assert result.non_validated == ["FANCD1", "FANCD20"]
classmethod validate(values, field=None, *, mute=False, organism=None, source=None, strict_source=False)

Validate values against existing values of a string field.

Note this is strict_source validation, only asserts exact matches.

Parameters:
  • values (list[str] | Series | array) – Values that will be validated against the field.

  • field (str | DeferredAttribute | None, default: None) – The field of values. Examples are 'ontology_id' to map against the source ID or 'name' to map against the ontologies field names.

  • mute (bool, default: False) – Whether to mute logging.

  • organism (str | SQLRecord | None, default: None) – An Organism name or record.

  • source (SQLRecord | None, default: None) – A bionty.Source record that specifies the version to validate against.

  • strict_source (bool, default: False) – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

Return type:

ndarray

Returns:

A vector of booleans indicating if an element is validated.

See also

inspect()

Example:

import bionty as bt

bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
bt.Gene.validate(gene_symbols, field=bt.Gene.symbol, organism="human")
#> array([ True,  True, False, False])
classmethod from_values(values, field=None, create=False, organism=None, source=None, mute=False)

Bulk create validated records by parsing values for an identifier such as a name or an id).

Parameters:
  • values (list[str] | Series | array) – A list of values for an identifier, e.g. ["name1", "name2"].

  • field (str | DeferredAttribute | None, default: None) – A SQLRecord field to look up, e.g., bt.CellMarker.name.

  • create (bool, default: False) – Whether to create records if they don’t exist.

  • organism (SQLRecord | str | None, default: None) – A bionty.Organism name or record.

  • source (SQLRecord | None, default: None) – A bionty.Source record to validate against to create records for.

  • mute (bool, default: False) – Whether to mute logging.

Return type:

SQLRecordList

Returns:

A list of validated records. For bionty registries. Also returns knowledge-coupled records.

Notes

For more info, see tutorial: Manage biological ontologies.

Example:

import bionty as bt

# Bulk create from non-validated values will log warnings & returns empty list
ulabels = ln.ULabel.from_values(["benchmark", "prediction", "test"])
assert len(ulabels) == 0

# Bulk create records from validated values returns the corresponding existing records
ulabels = ln.ULabel.from_values(["benchmark", "prediction", "test"], create=True).save()
assert len(ulabels) == 3

# Bulk create records from public reference
bt.CellType.from_values(["T cell", "B cell"]).save()
classmethod standardize(values, field=None, *, return_field=None, return_mapper=False, case_sensitive=False, mute=False, source_aware=True, keep='first', synonyms_field='synonyms', organism=None, source=None, strict_source=False)

Maps input synonyms to standardized names.

Parameters:
  • values (Iterable) – Identifiers that will be standardized.

  • field (str | DeferredAttribute | None, default: None) – The field representing the standardized names.

  • return_field (str | DeferredAttribute | None, default: None) – The field to return. Defaults to field.

  • return_mapper (bool, default: False) – If True, returns {input_value: standardized_name}.

  • case_sensitive (bool, default: False) – Whether the mapping is case sensitive.

  • mute (bool, default: False) – Whether to mute logging.

  • source_aware (bool, default: True) – Whether to standardize from public source. Defaults to True for BioRecord registries.

  • keep (Literal['first', 'last', False], default: 'first') –

    When a synonym maps to multiple names, determines which duplicates to mark as pd.DataFrame.duplicated: - "first": returns the first mapped standardized name - "last": returns the last mapped standardized name - False: returns all mapped standardized name.

    When keep is False, the returned list of standardized names will contain nested lists in case of duplicates.

    When a field is converted into return_field, keep marks which matches to keep when multiple return_field values map to the same field value.

  • synonyms_field (str, default: 'synonyms') – A field containing the concatenated synonyms.

  • organism (str | SQLRecord | None, default: None) – An Organism name or record.

  • source (SQLRecord | None, default: None) – A bionty.Source record that specifies the version to validate against.

  • strict_source (bool, default: False) – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against public sources.

Return type:

list[str] | dict[str, str]

Returns:

If return_mapper is False – a list of standardized names. Otherwise, a dictionary of mapped values with mappable synonyms as keys and standardized names as values.

See also

add_synonym()

Add synonyms.

remove_synonym()

Remove synonyms.

Example:

import bionty as bt

# save some gene records
bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol", organism="human").save()

# standardize gene synonyms
gene_synonyms = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
bt.Gene.standardize(gene_synonyms)
#> ['A1CF', 'A1BG', 'BRCA2', 'FANCD20']

Methods

query_parents()

Query all parents of a record recursively.

While .parents retrieves the direct parents, this method retrieves all ancestors of the current record.

Return type:

QuerySet

query_children()

Query all children of a record recursively.

While .children retrieves the direct children, this method retrieves all descendants of a parent.

Return type:

QuerySet

query_records()

Query records of sub types.

While .records retrieves the records with the current type, this method also retrieves sub types and the records with sub types of the current type.

Return type:

QuerySet

type_to_dataframe(recurse=False)

Export all instances of this record type to a pandas DataFrame.

This is almost equivalent to:

ln.Record.filter(type=sample_type).to_dataframe(include="features")

type_to_dataframe() ensures that the columns are ordered according to the schema of the type and encodes fields like uid and name.

Parameters:

recurse (bool, default: False) – bool = False Whether to include records of sub-types recursively.

Return type:

DataFrame

to_artifact(key=None)

Calls type_to_dataframe() to create an artifact.

Return type:

Artifact

restore()

Restore from trash onto the main branch.

Does not restore descendant records if the record is HasType with is_type = True.

Return type:

None

delete(permanent=None, **kwargs)

Delete record.

If record is HasType with is_type = True, deletes all descendant records, too.

Parameters:

permanent (bool | None, default: None) – Whether to permanently delete the record (skips trash). If None, performs soft delete if the record is not already in the trash.

Return type:

None

Examples

For any SQLRecord object record, call:

>>> record.delete()
save(*args, **kwargs)

Save.

Always saves to the default database.

Return type:

TypeVar(T, bound= SQLRecord)

query_types()

Query types of a record recursively.

While .type retrieves the type, this method retrieves all super types of that type:

# Create type hierarchy
type1 = model_class(name="Type1", is_type=True).save()
type2 = model_class(name="Type2", is_type=True, type=type1).save()
type3 = model_class(name="Type3", is_type=True, type=type2).save()

# Create a record with type3
record = model_class(name=f"{model_name}3", type=type3).save()

# Query super types
super_types = record.query_types()
assert super_types[0] == type3
assert super_types[1] == type2
assert super_types[2] == type1
Return type:

SQLRecordList

add_synonym(synonym, force=False, save=None)

Add synonyms to a record.

Parameters:
  • synonym (str | list[str] | Series | array) – The synonyms to add to the record.

  • force (bool, default: False) – Whether to add synonyms even if they are already synonyms of other records.

  • save (bool | None, default: None) – Whether to save the record to the database.

See also

remove_synonym()

Remove synonyms.

Example:

import bionty as bt

# save "T cell" record
record = bt.CellType.from_source(name="T cell").save()
record.synonyms
#> "T-cell|T lymphocyte|T-lymphocyte"

# add a synonym
record.add_synonym("T cells")
record.synonyms
#> "T cells|T-cell|T-lymphocyte|T lymphocyte"
remove_synonym(synonym)

Remove synonyms from a record.

Parameters:

synonym (str | list[str] | Series | array) – The synonym values to remove.

See also

add_synonym()

Add synonyms

Example:

import bionty as bt

# save "T cell" record
record = bt.CellType.from_source(name="T cell").save()
record.synonyms
#> "T-cell|T lymphocyte|T-lymphocyte"

# remove a synonym
record.remove_synonym("T-cell")
record.synonyms
#> "T lymphocyte|T-lymphocyte"
set_abbr(value)

Set value for abbr field and add to synonyms.

Parameters:

value (str) – A value for an abbreviation.

See also

add_synonym()

Example:

import bionty as bt

# save an experimental factor record
scrna = bt.ExperimentalFactor.from_source(name="single-cell RNA sequencing").save()
assert scrna.abbr is None
assert scrna.synonyms == "single-cell RNA-seq|single-cell transcriptome sequencing|scRNA-seq|single cell RNA sequencing"

# set abbreviation
scrna.set_abbr("scRNA")
assert scrna.abbr == "scRNA"
# synonyms are updated
assert scrna.synonyms == "scRNA|single-cell RNA-seq|single cell RNA sequencing|single-cell transcriptome sequencing|scRNA-seq"
refresh_from_db(using=None, fields=None, from_queryset=None)

Reload field values from the database.

By default, the reloading happens from the database this instance was loaded from, or by the read router if this instance wasn’t loaded from any database. The using parameter will override the default.

Fields can be used to specify which fields to reload. The fields should be an iterable of field attnames. If fields is None, then all non-deferred fields are reloaded.

When accessing deferred fields of an instance, the deferred loading of the field will call this method.

async arefresh_from_db(using=None, fields=None, from_queryset=None)
view_parents(field=None, with_children=False, distance=5)

View parents in an ontology.

Parameters:
  • field (str | DeferredAttribute | None, default: None) – Field to display on graph

  • with_children (bool, default: False) – Whether to also show children.

  • distance (int, default: 5) – Maximum distance still shown.

Ontological hierarchies: ULabel (project & sub-project), CellType (cell type & subtype).

Examples

>>> import bionty as bt
>>> bt.Tissue.from_source(name="subsegmental bronchus").save()
>>> record = bt.Tissue.get(name="respiratory tube")
>>> record.view_parents()
>>> tissue.view_parents(with_children=True)
view_children(field=None, distance=5)

View children in an ontology.

Parameters:
  • field (str | DeferredAttribute | None, default: None) – Field to display on graph

  • distance (int, default: 5) – Maximum distance still shown.

Ontological hierarchies: ULabel (project & sub-project), CellType (cell type & subtype).

Examples

>>> import bionty as bt
>>> bt.Tissue.from_source(name="subsegmental bronchus").save()
>>> record = bt.Tissue.get(name="respiratory tube")
>>> record.view_parents()
>>> tissue.view_parents(with_children=True)