Quickstart
Installation
To install this package:
pip install cognite-transformations-cli
If the Cognite SDK is not already installed, the installation will automatically fetch and install it as well.
Usage
Authenticate with API keys
- To use transformations-cli:
The
TRANSFORMATIONS_API_KEYenvironment variable must be set to a valid API key for a service account which has access to Transformations.TRANSFORMATIONS_PROJECTenvironment variable is optional for API key authentication, CDF project can be inferred from API key if skipped.
Authenticate with OIDC credentials
- When using OIDC, you need to set the environment variables:
TRANSFORMATIONS_CLIENT_ID: RequiredTRANSFORMATIONS_CLIENT_SECRET: RequiredTRANSFORMATIONS_TOKEN_URL: RequiredTRANSFORMATIONS_PROJECT: RequiredTRANSFORMATIONS_SCOPES: Transformations CLI assumes that this is optional, generally required to authenticate except for Aize project. Space separated for multiple scopes.TRANSFORMATIONS_AUDIENCE: Optional
By default, transformations-cli runs against the main CDF cluster (europe-west1-1). To use a different cluster, specify the --cluster parameter or set the environment variable TRANSFORMATIONS_CLUSTER. Note that this is a global parameter, which must be specified before the subcommand. For example:
transformations-cli --cluster=greenfield <subcommand> [...args]
Command |
Args |
Options |
Description |
|---|---|---|---|
list |
|
List transformations |
|
show |
|
Show a transformation/job |
|
jobs |
|
List jobs |
|
delete |
|
Delete a transformation |
|
query |
|
|
Run a query |
run |
|
Run a transformation |
|
deploy |
|
Deploy transformations |
Help
transformations-cli --help
transformations-cli <subcommand> --help
transformations-cli list
transformations-cli list can list transformations in a CDF project. --limit option is used to change number of transformations to list, defaults to 10.
transformations-cli list
transformations-cli list --limit=2
It prints transformations details in a tabular format.
List options Option
Default
Flag
Required
Multi value
Description
--limit10
No
No
No
Number of transformations to list. Use -1 to list all.
--interactiveFalse
Yes
No
No
Show 10 transformations at a time, waiting for keypress to display next batch.
--data-set-idNo
No
No
Yes
Filter transformations by data set ID.
--data-set-external-idNo
No
No
Yes
Filter transformations by data set External ID.
--destination-typeNo
No
No
No
Filter transformations by destination type: assets, events, timeseries…
--conflict-modeNo
No
No
No
Filter transformations by conflict mode: upsert, abort, update, delete.
--tagNo
No
No
Yes
Filter transformations that have the provided tag on it.
transformations-cli show
transformations-cli show can show the details of a transformation and/or a transformatin job.
At minimum, this command requires either an --id or --external-id or --job-id to be specified:
transformations-cli show --id=1234
transformations-cli show --external-id=my-transformation
transformations-cli show --job-id=1
transformations-cli show --external-id=my-transformation --job-id=1
It prints the transformation details in a tabular format, such as latest job’s metrics and notifications.
Option |
Flag |
Required |
Description |
|---|---|---|---|
|
No |
No |
The |
|
No |
No |
The |
|
No |
No |
The id of the job to show. Include this to show job details. |
transformations-cli jobs
transformations-cli jobs can list the latest jobs.
You can optionally provide the external_id or id of the transformations of which jobs you want to list.
You can also provide --limit, which defaults to 10. Use --limit=-1 if you want to list all.
transformations-cli jobs
transformations-cli jobs --limit=2
transformations-cli jobs --id=1234
transformations-cli jobs --external-id=my-transformation
Option |
Default |
Flag |
Required |
Description |
|---|---|---|---|---|
|
10 |
No |
No |
Limit for the job history. Use -1 to retrieve all results. |
|
No |
No |
List jobs by transformation |
|
|
No |
No |
List jobs by transformation |
|
|
False |
Yes |
No |
Show 10 jobs at a time, waiting for keypress to display next batch. |
transformations-cli delete
transformations-cli provides a delete subcommand, which can delete a transformation.
At minimum, this command requires either an --id or --external-id to be specified:
transformations-cli delete --id=1234
transformations-cli delete --external-id=my-transformation
You can also specify --delete-schedule flag to delete a scheduled transformation.
transformations-cli delete --id=1234 --delete-schedule
Option |
Default |
Flag |
Required |
Description |
|---|---|---|---|---|
|
No |
No |
|
|
|
No |
No |
|
|
|
False |
Yes |
No |
Scheduled transformations cannot be deleted, delete schedule along with the transformation. |
Make a query: transformations-cli query
transformations-cli also allows you to run queries.
transformations-cli query "select * from _cdf.assets limit 100"
This will print the schema and the results.
The query command is intended for previewing your SQL queries, and is not designed for large data exports. For this reason, there are a few limits in place. Query command takes --infer-schema-limit, --source-limit and --limit options. Default values are 100, 100 and 1000 in the corresponding order.
Arg |
Required |
Description |
|---|---|---|
|
Yes |
SQL query to preview, string. |
Option |
Default |
Flag |
Required |
Description |
|---|---|---|---|---|
|
1000 |
No |
No |
This is equivalent to a final LIMIT clause on your query. |
|
100 |
No |
No |
This limits the number of rows to read from each data source. |
|
100 |
No |
No |
Schema inference limit. |
- More details on
source limitandinfer schema limit: --source-limit: For example, if the source limit is 100, and you take the UNION of two tables, you will get 200 rows back. This parameter is set to 100 by default, but you can remove this limit by setting it to -1.--infer-schema-limit: As RAW tables have no predefined schema, we need to read some number of rows to infer the schema of the table. As with the source limit, this is set to 100 by default, and can be made unlimited by setting it to -1. If your RAW data is not properly being split into separate columns, you should try to increase or remove this limit.
transformations-cli query --source-limit=-1 "select * from db.table"
Start a transformation job: transformations-cli run
transformations-cli run can start transformation jobs and/or wait for jobs to complete.
At minimum, this command requires either an --id or --external-id to be specified:
transformations-cli run --id=1234
transformations-cli run --external-id=my-transformation
Without any additional arguments, this command will start a transformation job, and exit immediately. If you want wait for the job to complete, use the --watch option:
transformations-cli run --id=1234 --watch
When using the --watch option, transformation-cli will return a non-zero exit code if the transformation job failed, or if it did not finish within a given timeout (which is 12 hours by default). This timeout can be configured using the --time-out option.
If you want to watch a job for completion without actually starting a transformation job, specify --watch-only instead of --watch. This will watch the most recently started job for completion.
Option |
Default |
Flag |
Required |
Description |
|---|---|---|---|---|
|
No |
No |
The |
|
|
No |
No |
The |
|
|
False |
Yes |
No |
Wait until job has completed. |
|
False |
Yes |
No |
Do not start a transformation job, only watch the most recent job for completion. |
|
12 hr (in secs) |
No |
No |
Maximum amount of time to wait for job to complete in seconds. |
Deploy transformations: transformations-cli deploy
transformations-cli deploy is used to create or update transformations described by manifests.
- The primary purpose of transformations-cli is to support continuous delivery, allowing you to manage transformations in a version control system:
Transformations are described by YAML files, whose structure is described further below in this document.
It is recommended to place these manifest files in their own directory, to avoid conflicts with other files.
To deploy a set of transformations, use the deploy subcommand:
transformations-cli deploy <path>
The <path> argument should point to a directory containing YAML manifests.
This directory is scanned recursively for *.yml and *.yaml files, so you can organize your transformations into separate subdirectories.
Arg |
Default |
Required |
Description |
|---|---|---|---|
|
. |
Yes |
Root folder of transformation manifests. |
Option |
Default |
Flag |
Required |
Description |
|---|---|---|---|---|
|
Yes |
No |
No |
Print ``external_id``s for the upserted resources besides the counts. |
Transformation Manifest
- Important notes:
When a scheduled transformation is represented in a manifest without
scheduleprovided, deploy will delete the existing schedule.When an existing notification is not provided along with the transformation to be updated, notification will be deleted.
Values specified as
${VALUE}are treated as environment variables whileVALUEis directly used as the actual value.Old
jetfire-clistyle manifests can be used by addinglegacy: trueinside the old manifest.
# Required
externalId: "test-cli-transform-oidc"
# Required
name: "test-cli-transform-oidc"
# Required
# Valid values are: "assets", "timeseries", "asset_hierarchy", events", "datapoints",
# "string_datapoints", "sequences", "files", "labels", "relationships",
# "raw", "data_sets", "sequence_rows", "nodes", "edges"
destination:
type: "assets"
# destination: "assets"
# When writing to RAW tables, use the following syntax:
# destination:
# type: raw
# database: some_database
# table: some_table
# When writing to sequence rows, use the following syntax:
# destination:
# type: sequence_rows
# externalId: some_sequence
# when writing to nodes in your data model, use the following syntax:
# NOTE: view is optional, not needed for writing nodes without a view
# NOTE: instanceSpace is optional. If not set, it is a mandatory property(column) in the data
# destination:
# type: nodes
# instanceSpace: InstanceSpace
# view:
# space: TypeSpace
# externalId: TypeExternalId
# version: version
# when writing to edges ( aka connection definition) in your data model, use the following syntax:
# NOTE: instanceSpace is optional. If not set, it is a mandatory property(column) in the data
# destination:
# type: edges
# instanceSpace: InstanceSpace
# edgeType:
# space: TypeSpace
# externalId: TypeExternalId
# when writing to edges with view in your data model, use the following syntax:
# NOTE: instanceSpace is optional. If not set, it is a mandatory property(column) in the data
# destination:
# type: edges
# instanceSpace: InstanceSpace
# view:
# space: TypeSpace
# externalId: TypeExternalId
# version: version
# when writing instances to a type, use the following syntax:
# NOTE: instanceSpace is optional. If not set, it is a mandatory property(column) in the data
# destination:
# type: instances
# instanceSpace: InstanceSpace
# dataModel:
# space: modelSpace
# externalId: modelExternalId
# version: modelVersion
# destination_type: viewExternalId
# when writing instances to a relationship, use the following syntax:
# NOTE: instanceSpace is optional. If not set, it is a mandatory property(column) in the data
# destination:
# type: instances
# instanceSpace: InstanceSpace
# dataModel:
# space: modelSpace
# externalId: modelExternalId
# version: modelVersion
# destination_type: viewExternalId
# destination_relationship_from_type: connectionPropertyName
# Optional, default: true
shared: true
# Optional, default: upsert
# Valid values are:
# upsert: Create new items, or update existing items if their id or externalId
# already exists.
# create: Create new items. The transformation will fail if there are id or
# externalId conflicts.
# update: Update existing items. The transformation will fail if an id or
# externalId does not exist.
# delete: Delete items by internal id.
action: "upsert"
# Required
query: "select 'My Assets Transformation' as name, 'asset1' as externalId"
# Or the path to a file containing the SQL query for this transformation.
# query:
# file: query.sql
# Optional, default: null
# If null, the transformation will not be scheduled.
schedule: "* * * * *"
# Or you can pause the schedules.
# schedule:
# interval: "* * * * *"
# isPaused: true
# Optional, default: true
ignoreNullFields: false
# Optional, default: null
# List of email adresses to send emails to on transformation errors
notifications:
- [email protected]
- [email protected]
# Optional, default: null
# Skipping this field or providing null clears
# the data set ID on updating the transformation
dataSetId: 1
# Or you can provide data set external ID instead,
# Optional, default: null
# Skipping this field or providing null clears
# the data set ID on updating the transformation
dataSetExternalId: test-dataset
# Optional: You can tag your transformations with max 5 tags.
tags:
- mytag1
- mytag2
# The client credentials to be used in the transformation
authentication:
clientId: ${CLIENT_ID}
clientSecret: ${CLIENT_SECRET}
tokenUrl: ${TOKEN_URL}
scopes:
- ${SCOPES}
cdfProjectName: ${CDF_PROJECT_NAME}
# audience: ""
# If you need to specify read/write credentials separately
# authentication:
# read:
# clientId: ${CLIENT_ID}
# clientSecret: ${CLIENT_SECRET}
# tokenUrl: ${TOKEN_URL}
# scopes:
# - ${SCOPES}
# cdfProjectName: ${CDF_PROJECT_NAME}
# # audience: ""
# write:
# clientId: ${CLIENT_ID}
# clientSecret: ${CLIENT_SECRET}
# tokenUrl: ${TOKEN_URL}
# scopes:
# - ${SCOPES}
# cdfProjectName: ${CDF_PROJECT_NAME}
# # audience: ""