For some users, you may never need to use file definitions as all your data will be transferred between databases and non-file based storage means.
File definitions are required for when you need to work with data in files rather than database tables and APIs. Some common use cases for file definitions include:
This is a perfect example of creating a File System file definition, as it allows you to easily work with flat files stored on the same server as the agent.
Take the following example.
If I wanted to use the data stored in these CSV files, it’s as simple as creating a file definition with the following values:
|File Definition Types||File System||We are working with files stored on the local file system.|
||This is the folder that contains all the CSV files we’re wanting to query from.|
|File Format||Delimited||CSV files are delimited, with a
||As mentioned previously, this is the value that separates cells in a CSV file.|
Once this is configured, you can use the file definition as a source in data migrations - each table essentially like a table in a database schema.
File System File Definitions are used for working with traditional file storage means such as on a server computer’s hard disk.
In the cases where you need Data Governor to process files stored on the same host as the agent, you use the File System type.
Azure Blob Folders are file definitions that can be used to pull files out a folder in an Azure Blob container.
These are used essentially as the “schema” to an Azure Blob Connection.
When working with big data platforms such as Hadoop and Databricks, there is a need to upload data to the file systems so that the data processing platforms can perform operations and queries on it.
The Hadoop File System and Databricks File System (HDFS/DBFS) file definition type is used to define target connections for data migrations, where the target connection is one of the previously mentioned big data platforms.
Currently HDFS/DBFS File Definitions only support uploading delimited flat files to to the targets. In future releases binary file formats such as Apache Parquet will be supported.