Quickstart
Installation
To install this package:
pip install cognite-transformations-cli
If the Cognite SDK is not already installed, the installation will automatically fetch and install it as well.
Usage
Authenticate with API keys
- To use transformations-cli:
The
TRANSFORMATIONS_API_KEY
environment variable must be set to a valid API key for a service account which has access to Transformations.TRANSFORMATIONS_PROJECT
environment variable is optional for API key authentication, CDF project can be inferred from API key if skipped.
Authenticate with OIDC credentials
- When using OIDC, you need to set the environment variables:
TRANSFORMATIONS_CLIENT_ID
: RequiredTRANSFORMATIONS_CLIENT_SECRET
: RequiredTRANSFORMATIONS_TOKEN_URL
: RequiredTRANSFORMATIONS_PROJECT
: RequiredTRANSFORMATIONS_SCOPES
: Transformations CLI assumes that this is optional, generally required to authenticate except for Aize project. Space separated for multiple scopes.TRANSFORMATIONS_AUDIENCE
: Optional
By default, transformations-cli runs against the main CDF cluster (europe-west1-1). To use a different cluster, specify the --cluster
parameter or set the environment variable TRANSFORMATIONS_CLUSTER
. Note that this is a global parameter, which must be specified before the subcommand. For example:
transformations-cli --cluster=greenfield <subcommand> [...args]
Command |
Args |
Options |
Description |
---|---|---|---|
list |
|
List transformations |
|
show |
|
Show a transformation/job |
|
jobs |
|
List jobs |
|
delete |
|
Delete a transformation |
|
query |
|
|
Run a query |
run |
|
Run a transformation |
|
deploy |
|
Deploy transformations |
Help
transformations-cli --help
transformations-cli <subcommand> --help
transformations-cli list
transformations-cli list
can list transformations in a CDF project. --limit
option is used to change number of transformations to list, defaults to 10.
transformations-cli list
transformations-cli list --limit=2
It prints transformations details in a tabular format.
Option
Default
Flag
Required
Multi value
Description
--limit
10
No
No
No
Number of transformations to list. Use -1 to list all.
--interactive
False
Yes
No
No
Show 10 transformations at a time, waiting for keypress to display next batch.
--data-set-id
No
No
No
Yes
Filter transformations by data set ID.
--data-set-external-id
No
No
No
Yes
Filter transformations by data set External ID.
--destination-type
No
No
No
No
Filter transformations by destination type: assets, events, timeseries…
--conflict-mode
No
No
No
No
Filter transformations by conflict mode: upsert, abort, update, delete.
--tag
No
No
No
Yes
Filter transformations that have the provided tag on it.
transformations-cli show
transformations-cli show
can show the details of a transformation and/or a transformatin job.
At minimum, this command requires either an --id
or --external-id
or --job-id
to be specified:
transformations-cli show --id=1234
transformations-cli show --external-id=my-transformation
transformations-cli show --job-id=1
transformations-cli show --external-id=my-transformation --job-id=1
It prints the transformation details in a tabular format, such as latest job’s metrics and notifications.
Option |
Flag |
Required |
Description |
---|---|---|---|
|
No |
No |
The |
|
No |
No |
The |
|
No |
No |
The id of the job to show. Include this to show job details. |
transformations-cli jobs
transformations-cli jobs
can list the latest jobs.
You can optionally provide the external_id
or id
of the transformations of which jobs you want to list.
You can also provide --limit
, which defaults to 10. Use --limit=-1
if you want to list all.
transformations-cli jobs
transformations-cli jobs --limit=2
transformations-cli jobs --id=1234
transformations-cli jobs --external-id=my-transformation
Option |
Default |
Flag |
Required |
Description |
---|---|---|---|---|
|
10 |
No |
No |
Limit for the job history. Use -1 to retrieve all results. |
|
No |
No |
List jobs by transformation |
|
|
No |
No |
List jobs by transformation |
|
|
False |
Yes |
No |
Show 10 jobs at a time, waiting for keypress to display next batch. |
transformations-cli delete
transformations-cli
provides a delete subcommand, which can delete a transformation.
At minimum, this command requires either an --id
or --external-id
to be specified:
transformations-cli delete --id=1234
transformations-cli delete --external-id=my-transformation
You can also specify --delete-schedule
flag to delete a scheduled transformation.
transformations-cli delete --id=1234 --delete-schedule
Option |
Default |
Flag |
Required |
Description |
---|---|---|---|---|
|
No |
No |
|
|
|
No |
No |
|
|
|
False |
Yes |
No |
Scheduled transformations cannot be deleted, delete schedule along with the transformation. |
Make a query: transformations-cli query
transformations-cli also allows you to run queries.
transformations-cli query "select * from _cdf.assets limit 100"
This will print the schema and the results.
The query command is intended for previewing your SQL queries, and is not designed for large data exports. For this reason, there are a few limits in place. Query command takes --infer-schema-limit
, --source-limit
and --limit
options. Default values are 100, 100 and 1000 in the corresponding order.
Arg |
Required |
Description |
---|---|---|
|
Yes |
SQL query to preview, string. |
Option |
Default |
Flag |
Required |
Description |
---|---|---|---|---|
|
1000 |
No |
No |
This is equivalent to a final LIMIT clause on your query. |
|
100 |
No |
No |
This limits the number of rows to read from each data source. |
|
100 |
No |
No |
Schema inference limit. |
- More details on
source limit
andinfer schema limit
: --source-limit
: For example, if the source limit is 100, and you take the UNION of two tables, you will get 200 rows back. This parameter is set to 100 by default, but you can remove this limit by setting it to -1.--infer-schema-limit
: As RAW tables have no predefined schema, we need to read some number of rows to infer the schema of the table. As with the source limit, this is set to 100 by default, and can be made unlimited by setting it to -1. If your RAW data is not properly being split into separate columns, you should try to increase or remove this limit.
transformations-cli query --source-limit=-1 "select * from db.table"
Start a transformation job: transformations-cli run
transformations-cli run
can start transformation jobs and/or wait for jobs to complete.
At minimum, this command requires either an --id
or --external-id
to be specified:
transformations-cli run --id=1234
transformations-cli run --external-id=my-transformation
Without any additional arguments, this command will start a transformation job, and exit immediately. If you want wait for the job to complete, use the --watch
option:
transformations-cli run --id=1234 --watch
When using the --watch
option, transformation-cli will return a non-zero exit code if the transformation job failed, or if it did not finish within a given timeout (which is 12 hours by default). This timeout can be configured using the --time-out
option.
If you want to watch a job for completion without actually starting a transformation job, specify --watch-only
instead of --watch
. This will watch the most recently started job for completion.
Option |
Default |
Flag |
Required |
Description |
---|---|---|---|---|
|
No |
No |
The |
|
|
No |
No |
The |
|
|
False |
Yes |
No |
Wait until job has completed. |
|
False |
Yes |
No |
Do not start a transformation job, only watch the most recent job for completion. |
|
12 hr (in secs) |
No |
No |
Maximum amount of time to wait for job to complete in seconds. |
Deploy transformations: transformations-cli deploy
transformations-cli deploy
is used to create or update transformations described by manifests.
- The primary purpose of transformations-cli is to support continuous delivery, allowing you to manage transformations in a version control system:
Transformations are described by YAML files, whose structure is described further below in this document.
It is recommended to place these manifest files in their own directory, to avoid conflicts with other files.
To deploy a set of transformations, use the deploy subcommand:
transformations-cli deploy <path>
The <path>
argument should point to a directory containing YAML manifests.
This directory is scanned recursively for *.yml
and *.yaml
files, so you can organize your transformations into separate subdirectories.
Arg |
Default |
Required |
Description |
---|---|---|---|
|
. |
Yes |
Root folder of transformation manifests. |
Option |
Default |
Flag |
Required |
Description |
---|---|---|---|---|
|
Yes |
No |
No |
Print ``external_id``s for the upserted resources besides the counts. |
Transformation Manifest
- Important notes:
When a scheduled transformation is represented in a manifest without
schedule
provided, deploy will delete the existing schedule.When an existing notification is not provided along with the transformation to be updated, notification will be deleted.
Values specified as
${VALUE}
are treated as environment variables whileVALUE
is directly used as the actual value.Old
jetfire-cli
style manifests can be used by addinglegacy: true
inside the old manifest.
# Required
externalId: "test-cli-transform-oidc"
# Required
name: "test-cli-transform-oidc"
# Required
# Valid values are: "assets", "timeseries", "asset_hierarchy", events", "datapoints",
# "string_datapoints", "sequences", "files", "labels", "relationships",
# "raw", "data_sets", "sequence_rows", "nodes", "edges"
destination:
type: "assets"
# destination: "assets"
# When writing to RAW tables, use the following syntax:
# destination:
# type: raw
# database: some_database
# table: some_table
# When writing to sequence rows, use the following syntax:
# destination:
# type: sequence_rows
# externalId: some_sequence
# when writing to nodes in your data model, use the following syntax:
# NOTE: view is optional, not needed for writing nodes without a view
# NOTE: instanceSpace is optional. If not set, it is a mandatory property(column) in the data
# destination:
# type: nodes
# instanceSpace: InstanceSpace
# view:
# space: TypeSpace
# externalId: TypeExternalId
# version: version
# when writing to edges ( aka connection definition) in your data model, use the following syntax:
# NOTE: instanceSpace is optional. If not set, it is a mandatory property(column) in the data
# destination:
# type: edges
# instanceSpace: InstanceSpace
# edgeType:
# space: TypeSpace
# externalId: TypeExternalId
# when writing to edges with view in your data model, use the following syntax:
# NOTE: instanceSpace is optional. If not set, it is a mandatory property(column) in the data
# destination:
# type: edges
# instanceSpace: InstanceSpace
# view:
# space: TypeSpace
# externalId: TypeExternalId
# version: version
# when writing instances to a type, use the following syntax:
# NOTE: instanceSpace is optional. If not set, it is a mandatory property(column) in the data
# destination:
# type: instances
# instanceSpace: InstanceSpace
# dataModel:
# space: modelSpace
# externalId: modelExternalId
# version: modelVersion
# destination_type: viewExternalId
# when writing instances to a relationship, use the following syntax:
# NOTE: instanceSpace is optional. If not set, it is a mandatory property(column) in the data
# destination:
# type: instances
# instanceSpace: InstanceSpace
# dataModel:
# space: modelSpace
# externalId: modelExternalId
# version: modelVersion
# destination_type: viewExternalId
# destination_relationship_from_type: connectionPropertyName
# Optional, default: true
shared: true
# Optional, default: upsert
# Valid values are:
# upsert: Create new items, or update existing items if their id or externalId
# already exists.
# create: Create new items. The transformation will fail if there are id or
# externalId conflicts.
# update: Update existing items. The transformation will fail if an id or
# externalId does not exist.
# delete: Delete items by internal id.
action: "upsert"
# Required
query: "select 'My Assets Transformation' as name, 'asset1' as externalId"
# Or the path to a file containing the SQL query for this transformation.
# query:
# file: query.sql
# Optional, default: null
# If null, the transformation will not be scheduled.
schedule: "* * * * *"
# Or you can pause the schedules.
# schedule:
# interval: "* * * * *"
# isPaused: true
# Optional, default: true
ignoreNullFields: false
# Optional, default: null
# List of email adresses to send emails to on transformation errors
notifications:
- [email protected]
- [email protected]
# Optional, default: null
# Skipping this field or providing null clears
# the data set ID on updating the transformation
dataSetId: 1
# Or you can provide data set external ID instead,
# Optional, default: null
# Skipping this field or providing null clears
# the data set ID on updating the transformation
dataSetExternalId: test-dataset
# Optional: You can tag your transformations with max 5 tags.
tags:
- mytag1
- mytag2
# The client credentials to be used in the transformation
authentication:
clientId: ${CLIENT_ID}
clientSecret: ${CLIENT_SECRET}
tokenUrl: ${TOKEN_URL}
scopes:
- ${SCOPES}
cdfProjectName: ${CDF_PROJECT_NAME}
# audience: ""
# If you need to specify read/write credentials separately
# authentication:
# read:
# clientId: ${CLIENT_ID}
# clientSecret: ${CLIENT_SECRET}
# tokenUrl: ${TOKEN_URL}
# scopes:
# - ${SCOPES}
# cdfProjectName: ${CDF_PROJECT_NAME}
# # audience: ""
# write:
# clientId: ${CLIENT_ID}
# clientSecret: ${CLIENT_SECRET}
# tokenUrl: ${TOKEN_URL}
# scopes:
# - ${SCOPES}
# cdfProjectName: ${CDF_PROJECT_NAME}
# # audience: ""