Skip to main content

KeyValueStore

Key-value store is a storage for reading and writing data records with unique key identifiers.

The key-value store class acts as a high-level interface for storing, retrieving, and managing data records identified by unique string keys. It abstracts away the underlying storage implementation details, allowing you to work with the same API regardless of whether data is stored in memory, on disk, or in the cloud.

Each data record is associated with a specific MIME content type, allowing storage of various data formats such as JSON, text, images, HTML snapshots or any binary data. This class is commonly used to store inputs, outputs, and other artifacts of crawler operations.

You can instantiate a key-value store using the open class method, which will create a store with the specified name or id. The underlying storage implementation is determined by the configured storage client.

Usage

from crawlee.storages import KeyValueStore

# Open a named key-value store
kvs = await KeyValueStore.open(name='my-store')

# Store and retrieve data
await kvs.set_value('product-1234.json', [{'name': 'Smartphone', 'price': 799.99}])
product = await kvs.get_value('product-1234')

Hierarchy

Index

Methods

__init__

  • __init__(client, id, name): None
  • Initialize a new instance.

    Preferably use the KeyValueStore.open constructor to create a new instance.


    Parameters

    • client: KeyValueStoreClient

      An instance of a storage client.

    • id: str

      The unique identifier of the storage.

    • name: str | None

      The name of the storage, if available.

    Returns None

delete_value

  • async delete_value(key): None
  • Delete a value from the KVS.


    Parameters

    • key: str

      Key of the record to delete.

    Returns None

drop

  • async drop(): None
  • Drop the storage, removing it from the underlying storage client and clearing the cache.


    Returns None

get_auto_saved_value

  • async get_auto_saved_value(key, default_value): dict[str, JsonSerializable]
  • Get a value from KVS that will be automatically saved on changes.


    Parameters

    • key: str

      Key of the record, to store the value.

    • optionaldefault_value: dict[str, JsonSerializable] | None = None

      Value to be used if the record does not exist yet. Should be a dictionary.

    Returns dict[str, JsonSerializable]

get_metadata

get_public_url

  • async get_public_url(key): str
  • Get the public URL for the given key.


    Parameters

    • key: str

      Key of the record for which URL is required.

    Returns str

get_value

  • async get_value(key: str, default_value?: T | None): T | None
  • async get_value(key: str): Any
  • async get_value(key: str, default_value: T): T
  • async get_value(key: str, default_value?: T | None): T | None
  • Get a value from the KVS.


    Parameters

    • key: str

      Key of the record to retrieve.

    • optionaldefault_value: T | None = None

      Default value returned in case the record does not exist.

    Returns T | None

iterate_keys

  • Iterate over the existing keys in the KVS.


    Parameters

    • optionalexclusive_start_key: str | None = None

      Key to start the iteration from.

    • optionallimit: int | None = None

      Maximum number of keys to return. None means no limit.

    Returns AsyncIterator[KeyValueStoreRecordMetadata]

list_keys

  • List all the existing keys in the KVS.

    It uses client's iterate_keys method to get the keys.


    Parameters

    • optionalexclusive_start_key: str | None = None

      Key to start the iteration from.

    • optionallimit: int = 1000

      Maximum number of keys to return.

    Returns list[KeyValueStoreRecordMetadata]

open

  • async open(*, id, name, configuration, storage_client): Storage
  • Open a storage, either restore existing or create a new one.


    Parameters

    • optionalkeyword-onlyid: str | None = None

      The storage ID.

    • optionalkeyword-onlyname: str | None = None

      The storage name.

    • optionalkeyword-onlyconfiguration: Configuration | None = None

      Configuration object used during the storage creation or restoration process.

    • optionalkeyword-onlystorage_client: StorageClient | None = None

      Underlying storage client to use. If not provided, the default global storage client from the service locator will be used.

    Returns Storage

persist_autosaved_values

  • async persist_autosaved_values(): None
  • Force autosaved values to be saved without waiting for an event in Event Manager.


    Returns None

purge

  • async purge(): None
  • Purge the storage, removing all items from the underlying storage client.

    This method does not remove the storage itself, e.g. don't remove the metadata, but clears all items within it.


    Returns None

record_exists

  • async record_exists(key): bool
  • Check if a record with the given key exists in the key-value store.


    Parameters

    • key: str

      Key of the record to check for existence.

    Returns bool

set_value

  • async set_value(key, value, content_type): None
  • Set a value in the KVS.


    Parameters

    • key: str

      Key of the record to set.

    • value: Any

      Value to set.

    • optionalcontent_type: str | None = None

      The MIME content type string.

    Returns None

Properties

id

id: str

Get the storage ID.

name

name: str | None

Get the storage name.