Being a cloud hosted solution, there have been many considerations and precautions taken with the development of Data Governor Online so that your data is handled in a matter that is as secure as possible.
This article will detail how Data Governor handles sensitive information throughout the application as well as where your data is actually passed through during the execution of jobs and tasks.
The main points of this article are:
dgusersgroup for the purposes of file permissions.
When installing the Data Governor Agent the first considerations that need to made are what security exceptions need to be made for the Agent to work on your server, and how messages are transmitted between the server and the REST API.
The agent itself does not host any servers, instead it communicates with the Data Governor Server over a full duplex WebSocket Connection. WebSockets are a W3C Standard Protocol for communicating over the internet in a realtime fashion. The main notable feature of WebSockets is in relation to the ports required, they run off the standard ports for HTTP (80) and HTTPS (443).
As Data Governor Online uses HTTPS for transmission, this means that only port 443 will need to be opened for the agent to communicate with the server.
This means that hosting the agent only requires a server that you are able to browse the internet from, and there are no additional requirements assuming you have opened those ports.
If the Data Governor Agent is sitting behind a VNet, you will need to allow the following connections to the host, according to your location, as this is the server the agent uses to connect to Data Governor Online.
All communication between clients and server in Data Governor are over HTTPS, meaning that all packets are securely encrypted. Messages sent to the agent are also encrypted using a unique asymmetric decryption algorithm, which means that even if HTTPS wasn’t implemented the messages would still be secured.
If you are running the Agent on Windows, it is required that the agent runs with a user account that has the “Logon as a Service” permission enabled.
On Linux, a user group will be called
dgusers will be created during installation. This group is purely for managing file
permissions in the
/.dgagent/ directory, the user the agent is running as will be added to this group.
Data Governor Online’s Data Migration task allows you to easily move data across different sources and targets, even if they’re not on the same host. A major aspect of Data Migrations and all tasks in Data Governor Online is that none of your data ever passes through our servers, nor is it saved in any form on your tenant database.
This requirement of ensuring client data is never passed through our servers during the data in motion stage of tasks is the reason why you are required to install the Data Governor Agent - as you have full control over where it is installed and can assume that your data is only passing through the server which the agent is hosted on.
With Data Migration in particular, the agent will never save data from the migration. It is pulled from the source and pushed to the target, with the only transmissions to the server being in the form of logs and emitting the result of the task whether it is a success or a failure.
The following diagram provides a visualisation for how data flows in a Data Migration task;
As you can see the blue lines (representing your data being migrated) never leaves the path from the source connection, through the agent and to the target connection. The data transmitted from the agent to the server contains no data from the migration, instead providing helpful status updates on the migration and the job as a whole.
This flow does not change for cloud and on-premises connections, whilst you may also wish to use Azure SQL as a migration source or target the agent will never transmit that data through Data Governor’s Azure SQL Server.
Each Data Governor Online tenant has an associated database with it. For instances where you as a user have multiple tenants, switching between tenants is essentially a case of changing what database your user account is querying.
Data Governor Online has been designed from the ground up to only store data required by Data Governor to operate as expected.
Before detailing how each area of Data Governor is particularly handled in the database, the following pieces of information are standard across all database transactions:
Connections in Data Governor are the foundation for working with a variety of data sources and targets. As the Agent must be able to connect to these sources and targets, connection details are stored in the connections table of the tenant.
This is the only scenario where Data Governor stores data on the tenant that is considered sensitive to the users of the product. To factor this in, additional security considerations are taken when working with connection data.
Usernames and passwords for connections are stored separately from the connection string. Take the following example for an Azure SQL Database below:
In the case of Azure SQL, the username and password would usually be passed into the connection string as
User Id and
respectively. In Data Governor, we do not include the fields in the connection string and instead enter them into the provided inputs
above the connection string editor.
This will store the username and password separately from the connection string. The password field is also treated with extra care, the value for password encrypted with a per-tenant symmetric key (on top of the encryption provided by Azure SQL). The only time this value is decrypted is when it’s sent to the agent for usage in a task, the decrypted value is not stored anywhere and disposed of as soon as the operation is completed.
If you are still not comfortable storing the password in Data Governor’s servers despite the considerations taken, there are a few means of accessing some data sources/targets without providing credentials.
When working with Data Governor tasks, there are plenty of areas where the task has user inputted content that could contain sensitive data.
Data Governor Online cannot protect against sensitive content being included in tasks - it is the responsibility of the user to ensure that tasks do not contain sensitive information.
There are a few considerations that can be taken for the various task types:
Data Migration tasks only have data relating to the schema of the data being migrated, no information about the contents of the data sources are kept in the task.
That being said, when a task uses Query as Source ensure that your query does not contain any sensitive information.
These tasks involve the user providing a script for the Data Governor Agent to execute; with this user input, there is a possibility of the user providing sensitive information in the task. This can be problematic as not only is the data saved in the task table, logging from the task executions may duplicate the sensitive data across app logs.
If you require the usage of sensitive data in scripts, it is recommended you provide the value using environment variables on the host of the agent.
For example in Powershell:
# Retrieve a password value from a pre-defined environment variable. $password = $env:EXAMPLE_PASS;
One of Data Governor’s major features is its extensive logging capabilities. The only information provided by logging includes:
stderrto Information and Error logs respectively.