Quickstart

Installation

To install this package:

pip install cognite-transformations-cli

If the Cognite SDK is not already installed, the installation will automatically fetch and install it as well.

Usage

Authenticate with API keys

To use transformations-cli:

The TRANSFORMATIONS_API_KEY environment variable must be set to a valid API key for a service account which has access to Transformations.
TRANSFORMATIONS_PROJECT environment variable is optional for API key authentication, CDF project can be inferred from API key if skipped.

Authenticate with OIDC credentials

When using OIDC, you need to set the environment variables:

TRANSFORMATIONS_CLIENT_ID: Required
TRANSFORMATIONS_CLIENT_SECRET: Required
TRANSFORMATIONS_TOKEN_URL: Required
TRANSFORMATIONS_PROJECT: Required
TRANSFORMATIONS_SCOPES: Transformations CLI assumes that this is optional, generally required to authenticate except for Aize project. Space separated for multiple scopes.
TRANSFORMATIONS_AUDIENCE: Optional

By default, transformations-cli runs against the main CDF cluster (europe-west1-1). To use a different cluster, specify the --cluster parameter or set the environment variable TRANSFORMATIONS_CLUSTER. Note that this is a global parameter, which must be specified before the subcommand. For example:

transformations-cli --cluster=greenfield <subcommand> [...args]

Transformations CLI Commands
Command	Args	Options	Description
list		`--limit`, `--interactive`, `--data-set-id`, `data-set-external-id`, `destination-type`, `conflict-mode`, `--tag`	List transformations
show		`--external-id`, `--id`, `--job-id`	Show a transformation/job
jobs		`--external-id`, `--id`, `--limit`, `--interactive`	List jobs
delete		`--external-id`, `--id`, `--delete-schedule`	Delete a transformation
query	`query`	`--source-limit`, `--infer-schema-limit`, `--limit`	Run a query
run		`--external-id`, `--id`, `--watch`, `--watch-only`, `--time-out`	Run a transformation
deploy	`path`, `--debug`		Deploy transformations

Help

transformations-cli --help
transformations-cli <subcommand> --help

`transformations-cli list`

transformations-cli list can list transformations in a CDF project. --limit option is used to change number of transformations to list, defaults to 10.

transformations-cli list
transformations-cli list --limit=2

It prints transformations details in a tabular format.

List options

Option

Default

Flag

Required

Multi value

Description

--limit

10

No

No

No

Number of transformations to list. Use -1 to list all.

--interactive

False

Yes

No

No

Show 10 transformations at a time, waiting for keypress to display next batch.

--data-set-id

No

No

No

Yes

Filter transformations by data set ID.

--data-set-external-id

No

No

No

Yes

Filter transformations by data set External ID.

--destination-type

No

No

No

No

Filter transformations by destination type: assets, events, timeseries…

--conflict-mode

No

No

No

No

Filter transformations by conflict mode: upsert, abort, update, delete.

--tag

No

No

No

Yes

Filter transformations that have the provided tag on it.

`transformations-cli show`

transformations-cli show can show the details of a transformation and/or a transformatin job. At minimum, this command requires either an --id or --external-id or --job-id to be specified:

transformations-cli show --id=1234
transformations-cli show --external-id=my-transformation
transformations-cli show --job-id=1
transformations-cli show --external-id=my-transformation --job-id=1

It prints the transformation details in a tabular format, such as latest job’s metrics and notifications.

Show options
Option	Flag	Required	Description
`--id`	No	No	The `id` of the transformation to show. Either this or `--external-id` must be specified if `job-id` not specified.
`--external-id`	No	No	The `external_id` of the transformation to show. Either this or `--id` must be specified if `job-id` not specified.
`--job-id`	No	No	The id of the job to show. Include this to show job details.

`transformations-cli jobs`

transformations-cli jobs can list the latest jobs. You can optionally provide the external_id or id of the transformations of which jobs you want to list. You can also provide --limit, which defaults to 10. Use --limit=-1 if you want to list all.

transformations-cli jobs
transformations-cli jobs --limit=2
transformations-cli jobs --id=1234
transformations-cli jobs --external-id=my-transformation

Jobs options
Option	Default	Flag	Required	Description
`--limit`	10	No	No	Limit for the job history. Use -1 to retrieve all results.
`--id`		No	No	List jobs by transformation `id`. Either this or `--external-id` must be specified.
`--external-id`		No	No	List jobs by transformation `external_id`. Either this or `--id` must be specified.
`--interactive`	False	Yes	No	Show 10 jobs at a time, waiting for keypress to display next batch.

`transformations-cli delete`

transformations-cli provides a delete subcommand, which can delete a transformation.

At minimum, this command requires either an --id or --external-id to be specified:

transformations-cli delete --id=1234
transformations-cli delete --external-id=my-transformation

You can also specify --delete-schedule flag to delete a scheduled transformation.

transformations-cli delete --id=1234 --delete-schedule

Delete options
Option	Default	Flag	Required	Description
`--id`		No	No	`id` of the transformation to be deleted. Either this or `--external-id` must be specified.
`--external-id`		No	No	`external_id` of the transformation to be deleted. Either this or `--id` must be specified.
`--delete-schedule`	False	Yes	No	Scheduled transformations cannot be deleted, delete schedule along with the transformation.

Make a query: `transformations-cli query`

transformations-cli also allows you to run queries.

transformations-cli query "select * from _cdf.assets limit 100"

This will print the schema and the results. The query command is intended for previewing your SQL queries, and is not designed for large data exports. For this reason, there are a few limits in place. Query command takes --infer-schema-limit, --source-limit and --limit options. Default values are 100, 100 and 1000 in the corresponding order.

Query args
Arg	Required	Description
`query`	Yes	SQL query to preview, string.

Query options
Option	Default	Flag	Required	Description
`--limit`	1000	No	No	This is equivalent to a final LIMIT clause on your query.
`--source-limit`	100	No	No	This limits the number of rows to read from each data source.
`--infer-schema-limit`	100	No	No	Schema inference limit.

More details on source limit and infer schema limit:

--source-limit: For example, if the source limit is 100, and you take the UNION of two tables, you will get 200 rows back. This parameter is set to 100 by default, but you can remove this limit by setting it to -1.
--infer-schema-limit: As RAW tables have no predefined schema, we need to read some number of rows to infer the schema of the table. As with the source limit, this is set to 100 by default, and can be made unlimited by setting it to -1. If your RAW data is not properly being split into separate columns, you should try to increase or remove this limit.

transformations-cli query --source-limit=-1 "select * from db.table"

Start a transformation job: `transformations-cli run`

transformations-cli run can start transformation jobs and/or wait for jobs to complete.

At minimum, this command requires either an --id or --external-id to be specified:

transformations-cli run --id=1234
transformations-cli run --external-id=my-transformation

Without any additional arguments, this command will start a transformation job, and exit immediately. If you want wait for the job to complete, use the --watch option:

transformations-cli run --id=1234 --watch

When using the --watch option, transformation-cli will return a non-zero exit code if the transformation job failed, or if it did not finish within a given timeout (which is 12 hours by default). This timeout can be configured using the --time-out option.

If you want to watch a job for completion without actually starting a transformation job, specify --watch-only instead of --watch. This will watch the most recently started job for completion.

Run options
Option	Default	Flag	Required	Description
`--id`		No	No	The `id` of the transformation to run. Either this or `--external-id` must be specified.
`--external-id`		No	No	The `external_id` of the transformation to run. Either thisor `--id` must be specified.
`--watch`	False	Yes	No	Wait until job has completed.
`--watch-only`	False	Yes	No	Do not start a transformation job, only watch the most recent job for completion.
`--time-out`	12 hr (in secs)	No	No	Maximum amount of time to wait for job to complete in seconds.

Deploy transformations: `transformations-cli deploy`

transformations-cli deploy is used to create or update transformations described by manifests.

The primary purpose of transformations-cli is to support continuous delivery, allowing you to manage transformations in a version control system:

Transformations are described by YAML files, whose structure is described further below in this document.
It is recommended to place these manifest files in their own directory, to avoid conflicts with other files.

To deploy a set of transformations, use the deploy subcommand:

transformations-cli deploy <path>

The <path> argument should point to a directory containing YAML manifests. This directory is scanned recursively for *.yml and *.yaml files, so you can organize your transformations into separate subdirectories.

Deploy args
Arg	Default	Required	Description
`path`	.	Yes	Root folder of transformation manifests.

Debug options
Option	Default	Flag	Required	Description
`--debug`	Yes	No	No	Print ``external_id``s for the upserted resources besides the counts.

`Transformation Manifest`

Important notes:

When a scheduled transformation is represented in a manifest without schedule provided, deploy will delete the existing schedule.
When an existing notification is not provided along with the transformation to be updated, notification will be deleted.
Values specified as ${VALUE} are treated as environment variables while VALUE is directly used as the actual value.
Old jetfire-cli style manifests can be used by adding legacy: true inside the old manifest.

# Required
externalId: "test-cli-transform-oidc"
# Required
name: "test-cli-transform-oidc"

# Required
# Valid values are: "assets", "timeseries", "asset_hierarchy", events", "datapoints", 
# "string_datapoints", "sequences", "files", "labels", "relationships",
# "raw", "data_sets", "sequence_rows", "nodes", "edges"
destination: 
  type: "assets"

# destination: "assets"

# When writing to RAW tables, use the following syntax:
# destination:
#   type: raw
#   database: some_database
#   table: some_table

# When writing to sequence rows, use the following syntax:
# destination:
#   type: sequence_rows
#   externalId: some_sequence

# when writing to nodes in your data model, use the following syntax:
# NOTE: view is optional, not needed for writing nodes without a view
# NOTE: instanceSpace is optional. If not set, it is a mandatory property(column) in the data
# destination:
#   type: nodes
#   instanceSpace: InstanceSpace
#   view:  
#     space: TypeSpace
#     externalId: TypeExternalId
#     version: version

# when writing to edges ( aka connection definition) in your data model, use the following syntax:
# NOTE: instanceSpace is optional. If not set, it is a mandatory property(column) in the data
# destination:
#   type: edges
#   instanceSpace: InstanceSpace
#   edgeType:
#     space: TypeSpace
#     externalId: TypeExternalId

# when writing to edges with view in your data model, use the following syntax:
# NOTE: instanceSpace is optional. If not set, it is a mandatory property(column) in the data
# destination:
#   type: edges
#   instanceSpace: InstanceSpace
#   view:
#     space: TypeSpace
#     externalId: TypeExternalId
#     version: version

# when writing instances to a type, use the following syntax:
# NOTE: instanceSpace is optional. If not set, it is a mandatory property(column) in the data
# destination:
#   type: instances
#   instanceSpace: InstanceSpace
#   dataModel:
#     space: modelSpace
#     externalId: modelExternalId
#     version: modelVersion
#     destination_type: viewExternalId

# when writing instances to a relationship, use the following syntax:
# NOTE: instanceSpace is optional. If not set, it is a mandatory property(column) in the data
# destination:
#   type: instances
#   instanceSpace: InstanceSpace
#   dataModel:
#     space: modelSpace
#     externalId: modelExternalId
#     version: modelVersion
#     destination_type: viewExternalId
#     destination_relationship_from_type: connectionPropertyName

# Optional, default: true
shared: true

# Optional, default: upsert
# Valid values are:
#   upsert: Create new items, or update existing items if their id or externalId
#           already exists.
#   create: Create new items. The transformation will fail if there are id or
#           externalId conflicts.
#   update: Update existing items. The transformation will fail if an id or 
#           externalId does not exist.
#   delete: Delete items by internal id.
action: "upsert"

# Required
query: "select 'My Assets Transformation' as name, 'asset1' as externalId"

# Or the path to a file containing the SQL query for this transformation.
# query:
#   file: query.sql

# Optional, default: null
# If null, the transformation will not be scheduled.
schedule: "* * * * *"
# Or you can pause the schedules.
# schedule:
#   interval: "* * * * *"
#   isPaused: true

# Optional, default: true
ignoreNullFields: false

# Optional, default: null
# List of email adresses to send emails to on transformation errors
notifications:
  - [email protected]
  - [email protected]

# Optional, default: null
# Skipping this field or providing null clears
# the data set ID on updating the transformation
dataSetId: 1

# Or you can provide data set external ID instead,
# Optional, default: null
# Skipping this field or providing null clears
# the data set ID on updating the transformation
dataSetExternalId: test-dataset

# Optional: You can tag your transformations with max 5 tags.
tags:
  - mytag1
  - mytag2

# The client credentials to be used in the transformation
authentication:
  clientId: ${CLIENT_ID}
  clientSecret: ${CLIENT_SECRET}
  tokenUrl: ${TOKEN_URL}
  scopes: 
    - ${SCOPES}
  cdfProjectName: ${CDF_PROJECT_NAME}
  # audience: ""

# If you need to specify read/write credentials separately
# authentication:
#   read:
#     clientId: ${CLIENT_ID}
#     clientSecret: ${CLIENT_SECRET}
#     tokenUrl: ${TOKEN_URL}
#     scopes: 
#       - ${SCOPES}
#     cdfProjectName: ${CDF_PROJECT_NAME}
#     # audience: ""
#   write:
#     clientId: ${CLIENT_ID}
#     clientSecret: ${CLIENT_SECRET}
#     tokenUrl: ${TOKEN_URL}
#     scopes: 
#       - ${SCOPES}
#     cdfProjectName: ${CDF_PROJECT_NAME}
#     # audience: ""

Quickstart

Installation

Usage

Authenticate with API keys

Authenticate with OIDC credentials

Help

transformations-cli list

transformations-cli show

transformations-cli jobs

transformations-cli delete

Make a query: transformations-cli query

Start a transformation job: transformations-cli run

Deploy transformations: transformations-cli deploy

Transformation Manifest