Getting to Know Your Data

6 minutes


Updated October 9, 2023
Now Available
Security for Everyone

Identifying and protecting that which has value for your organization and your customers is at the heart of how you should approach security. For many of us, that value lies in the data we store and process.

This includes customer data, commercial IP, and company operational data. Some of this data is created by your organization and did not exist before this point. Other data is entrusted to your organization as part of the product or service that you offer. These different data types, their sensitivity, and the pathway they take through your company are all important factors to understand when planning how to protect them.

When starting to define your data protection requirements, we begin by identifying all the data within your organizational context.

Then, for each set of data, we need to gather some information to understand the data and its security requirements better. Specifically, the information detailed in the below chart.

Table: Things to Know About Your Data

QuestionWhy is this important?
What is this data?Understanding the structure, type, and purpose of the data you are reviewing.
What are the internal names used to refer to this data?Some datasets will have different names with different teams or across different organizations. Identifying any alternative names for the same data set helps avoid duplication in our records and processes.
Where is it stored?Identifying all of the locations that data is stored, processed, or handled allows us to understand all of the places it could be at risk (and may need protection).
Where does it come from?The data source allows us to understand the value of a data set and the stakeholders interested in its safe storage and usage.
How much of it exists?The more data we have, the more we have to protect. Understanding the scale of the data is important when understanding the scope of what needs to be protected.
How frequently does it change?Is the data static or dynamic? If data changes frequently, then our approach to protecting it needs to respect and work in this situation.
What causes this data to change?Whether it’s human action (such as editing a file) or automated action (such as by a process or system) the causes of data change often need to be identified so that such actions can be monitored and recorded.
Who can access this data?As we have discussed previously, we need to ensure that people have the minimum permissions to get the job done. This principle of least privilege helps us to ensure that accidents are avoided and the chance of malicious access is reduced.
Who is the owner of this data?Every data set should have an owner (either an individual or a role). The person who understands and is responsible for its life within your organization. This accountability and ownership simplifies decisions and allows us to have central points to make decisions on access, changes to data protection approaches and the eventual decision to destroy the data.
How long is this data stored for?The easiest data to protect is the data we don’t store. The next easiest data to protect is the data that is only kept for the time it is needed.

Defining how long we keep data for (also known as its retention period) helps us to minimize the data we store by proactively removing it when it is no longer needed.

Data retention periods may be defined by your company, by legal or financial systems or your customers.
Do we share this data with any other organization?Sharing data exposes it to risk. Those we share it with may have different approaches to data protection and risk, so when we choose to share data, we must do so with an understanding of what this means.
What regulations, law or compliance requirements is this data subject to?Whether its personally identifiable information (PII) or health information, there are many global and industry regulatory standards that govern what we can and cannot do with certain types of data. Proactively identifying these protected data types allows us to reduce risk and ensure we meet these standards.

As you catalog the data within an organization, it starts to become obvious that the data landscape isn’t static. Data enters the organization, is used and handled, and at some point may leave the organization again. We call this a data lifecycle, a process of stages that data passes through whilst in the custody of a company.

An Example Data Lifecycle: Snapsy

Let’s take a look at the data lifecycle for some simple data types within a hypothetical organization we’ll call Snapsy. Snapsy is a Software as a Service (SaaS) company that provides photographic printing services via their mobile application. In the diagram below, we’ve visualized how data moves through Snapsy.

Figure: The data lifecycle for data within an example organization.

On the left of our diagram, data enters our organization via two different routes:

Unlock expert knowledge.
Learn in depth. Get instant, lifetime access to the entire book. Plus online resources and future updates.
Now Available
  • Data is created: User records are created when a user is registered.

  • Data is collected: Images are uploaded into the application when a user chooses to get them printed.

Once the data has entered our organization, it moves to the active usage phase of its lifecycle. In this stage, the data has purpose and is being used to serve our customers and deliver our intended products and services.

In the center of our diagram, our data is used in several ways at this stage:

  • Images are processed, ready for printing.

  • Images are analyzed to provide metrics and statistics for the company to measure their performance.

  • Images, metadata, and metrics are stored for use later in the customer’s journey or as part of the ongoing analysis and operation of the company.

  • Images are shared with local printing companies so that they can be printed and shipped to the customer.

At the point where the data is shared with the local printing company, it has left the control of the SaaS company. It is no longer within our company’s ability to protect this data, as it has started a new data lifecycle within the printing company.

danger We should use caution when choosing our data-sharing partners and integrations because data transitions out of our control. These organizations are trusted to protect our data as we would want it to be protected. Take your time to verify this will be the case before you get too committed. Furthermore, it’s important to think about what the implications would be for your organization, if your partner or integration was breached and the data you shared with them lost or made public. While you cannot prevent this, you must plan for this worst-case scenario.

At the right of our diagram, the data has reached the end of its useful life within our company. This could be triggered by a number of factors. These include:

  • the user deleting their data or closing their account.

  • the data retention period expiring

  • the data is no longer relevant to the current customer or business model.

At this stage, the data will either be destroyed or archived.

Don’t Lose Track of Your Data Lifecycles

Remember that as your organization, products, and services evolve, so do the life cycles that are applied to our data.

important It is important to review and update your data life cycles at regular intervals to ensure that they remain accurate and the security controls you apply to them remain appropriate to reduce the risk of data loss, destruction, or exposure.

While you may not choose to formally document your data life cycles, have a run through on a white board at least once a year or on significant business changes to ensure you have a clear understanding and an actionable plan for weaving security through them. Engage with people from around your organization to work through this. The people who interact with the data the most are often the best to help define the data life cycles.

As you carry out this process, you and your team will naturally start to form a language to describe sensitive data, whenever and wherever you find it.

Definition The process of flagging specific types of data and giving it a name, label, or description that communicates its importance is known as information classification.

To get the most out of your data life cycles we need to go a little further into this important subject. With our data and the life cycles identified, it’s time to build an information classification system.

Not All Data Needs to Be Secured

You might be forgiven for dreading this section. Phrases like “information classification system” rarely spark excitement. But stay with me—let’s move past the dry language and dig into what this phrase means and why it matters to our company and its data.

The importance of this section starts with a truth: just because our organization generates, stores, or processes a piece of information, that doesn’t mean it is sensitive or needs securing.

Some of the data we handle, generate, or process poses little to no risk to our organization, no matter what we do with it. Conversely, there are data types that we encounter that can have significant impacts on our organization, our systems, or our customers—if they are mishandled.

You’re reading a preview of an online book. Buy it now for lifetime access to expert knowledge, including future updates.
If you found this post worthwhile, please share!