Add a File Definition

Firstly, locate the file definitions page from dropdown menu at the top of the application.

file definitions link

The following steps detailed in this documentation require you to at least have BI Developer permissions.

Clicking the “Add File Definition” button will take you to the file definitions form, this is a single page form that requires a few fields.

If you don’t completely understand the purpose of file definitions, the File Definitions Concepts page provides an overview of what file definitions do and how they are configured.

File Definition Fields

Name

Each file definition requires a name, this acts as a unique label that can easily identify what the file definition is used for.

Type

File definitions require a type to state what the definition is being used for.

Data Governor currently supports 3 different types of File Definitions.

  • File System
  • Azure Blob Folder
  • HDFS/DBFS

To learn more about these types, visit the File Definitions Concepts page.

Path

This is the path to the folder which will be used for the file definition. The format of this depends on the type of file definition.

Formatting File System Paths

File Systems require the full path to the folder that will contain the files you’re working with. Note that Data Governor supports both Linux and Windows style paths (eg: both C:\CSVData and /CSVData are the same).

Data Governor accepts the same paths you would use in a program like File Explorer so copying the path from the address bar is an easy way to get the path. image 2

Formatting Azure Blob Paths

The path for an Azure Blob file definition is just the name of the folder you wish to work in. For example if you had a container that had a folder called Data than your path for the definition would just be Data.

Formatting HDFS/DBFS Paths

As is the case wth File System paths, this value is just the full path to the folder you wish to use within the HDFS/DBFS.

File Format

The file format field determines how Data Governor reads and writes to files.

In most cases, you would use the “Delimited” format as standard flat file types.

Format Descriptions

File Format Description
Delimited The files being processed are to be delimited using a human readable character or set of characters.
Hex Delimited The files being processed use a hexadecimal based delimiter.

Delimiter

This is the character that is used to split cells. In most cases this will be a single character like a comma (.csv) however if you need to use whitespace based characters like a tab (.tsv) you can use the delimiter dropdown.

delimiter dropdown

Encoding

This is the file encoding that is used for reading and writing to the files associated with this file definition.

If you are unsure about what to use, select “UTF-8” as it supports the widest range of characters and languages.

Extension

This is the file extension to save and retrieve files with. Common examples for this include csv, dat and txt but it ultimately depends on your requirements.

If you are using this File Definition as a migration target and are unsure as to what extension to use, it is recommended to use csv as you can easily view the contents of the file with Microsoft Excel.

Parquet Support

Data Governor Online supports Apache Parquet as a migration target out of the box. This means that if a target connection utilizes a file definition with the file format “Parquet” Data Governor will automatically output the data to a parquet file rather than a flat file.

Note that when using Parquet files you are limited on additional customisation options such as the delimiter and encoding as these factors do not affect the creation of Parquet files.

If your files will need to have a header row (used for displaying the column names) then you should check header as Data Governor will factor this in with migrations to and from the file definition.

Validating File Definitions

After creating/editing a file definition, it is highly recommended you validate it with an agent. This is as easy as clicking the tick button next to the file definition and selecting an agent to validate it with.

Data Governor will ensure that path is valid and the agent can work with the provided definition.