Security for Everyone≫Part III: Securing Your Startup≫

Data Protection

17 minutes

From

editione1.0.0

Updated October 9, 2023

Security for Everyone

🚀 As explained by Laura

Getting to Know Your Data

Identifying and protecting that which has value for your organization and your customers is at the heart of how you should approach security. For many of us, that value lies in the data we store and process.

This includes customer data, commercial IP, and company operational data. Some of this data is created by your organization and did not exist before this point. Other data is entrusted to your organization as part of the product or service that you offer. These different data types, their sensitivity, and the pathway they take through your company are all important factors to understand when planning how to protect them.

When starting to define your data protection requirements, we begin by identifying all the data within your organizational context.

Then, for each set of data, we need to gather some information to understand the data and its security requirements better. Specifically, the information detailed in the below chart.

Table: Things to Know About Your Data

Question	Why is this important?
What is this data?	Understanding the structure, type, and purpose of the data you are reviewing.
What are the internal names used to refer to this data?	Some datasets will have different names with different teams or across different organizations. Identifying any alternative names for the same data set helps avoid duplication in our records and processes.
Where is it stored?	Identifying all of the locations that data is stored, processed, or handled allows us to understand all of the places it could be at risk (and may need protection).
Where does it come from?	The data source allows us to understand the value of a data set and the stakeholders interested in its safe storage and usage.
How much of it exists?	The more data we have, the more we have to protect. Understanding the scale of the data is important when understanding the scope of what needs to be protected.
How frequently does it change?	Is the data static or dynamic? If data changes frequently, then our approach to protecting it needs to respect and work in this situation.
What causes this data to change?	Whether it’s human action (such as editing a file) or automated action (such as by a process or system) the causes of data change often need to be identified so that such actions can be monitored and recorded.
Who can access this data?	As we have discussed previously, we need to ensure that people have the minimum permissions to get the job done. This principle of least privilege helps us to ensure that accidents are avoided and the chance of malicious access is reduced.
Who is the owner of this data?	Every data set should have an owner (either an individual or a role). The person who understands and is responsible for its life within your organization. This accountability and ownership simplifies decisions and allows us to have central points to make decisions on access, changes to data protection approaches and the eventual decision to destroy the data.
How long is this data stored for?	The easiest data to protect is the data we don’t store. The next easiest data to protect is the data that is only kept for the time it is needed. Defining how long we keep data for (also known as its retention period) helps us to minimize the data we store by proactively removing it when it is no longer needed. Data retention periods may be defined by your company, by legal or financial systems or your customers.
Do we share this data with any other organization?	Sharing data exposes it to risk. Those we share it with may have different approaches to data protection and risk, so when we choose to share data, we must do so with an understanding of what this means.
What regulations, law or compliance requirements is this data subject to?	Whether its personally identifiable information (PII) or health information, there are many global and industry regulatory standards that govern what we can and cannot do with certain types of data. Proactively identifying these protected data types allows us to reduce risk and ensure we meet these standards.

As you catalog the data within an organization, it starts to become obvious that the data landscape isn’t static. Data enters the organization, is used and handled, and at some point may leave the organization again. We call this a data lifecycle, a process of stages that data passes through whilst in the custody of a company.

An Example Data Lifecycle: Snapsy

Let’s take a look at the data lifecycle for some simple data types within a hypothetical organization we’ll call Snapsy. Snapsy is a Software as a Service (SaaS) company that provides photographic printing services via their mobile application. In the diagram below, we’ve visualized how data moves through Snapsy.

Figure: The data lifecycle for data within an example organization.

Unlock expert knowledge.

Learn in depth. Get instant, lifetime access to the entire book. Plus online resources and future updates.

On the left of our diagram, data enters our organization via two different routes:

Data is created: User records are created when a user is registered.
Data is collected: Images are uploaded into the application when a user chooses to get them printed.

Once the data has entered our organization, it moves to the active usage phase of its lifecycle. In this stage, the data has purpose and is being used to serve our customers and deliver our intended products and services.

In the center of our diagram, our data is used in several ways at this stage:

Images are processed, ready for printing.
Images are analyzed to provide metrics and statistics for the company to measure their performance.
Images, metadata, and metrics are stored for use later in the customer’s journey or as part of the ongoing analysis and operation of the company.
Images are shared with local printing companies so that they can be printed and shipped to the customer.

At the point where the data is shared with the local printing company, it has left the control of the SaaS company. It is no longer within our company’s ability to protect this data, as it has started a new data lifecycle within the printing company.

danger We should use caution when choosing our data-sharing partners and integrations because data transitions out of our control. These organizations are trusted to protect our data as we would want it to be protected. Take your time to verify this will be the case before you get too committed. Furthermore, it’s important to think about what the implications would be for your organization, if your partner or integration was breached and the data you shared with them lost or made public. While you cannot prevent this, you must plan for this worst-case scenario.

At the right of our diagram, the data has reached the end of its useful life within our company. This could be triggered by a number of factors. These include:

the user deleting their data or closing their account.
the data retention period expiring
the data is no longer relevant to the current customer or business model.

At this stage, the data will either be destroyed or archived.

Don’t Lose Track of Your Data Lifecycles

Remember that as your organization, products, and services evolve, so do the life cycles that are applied to our data.

important It is important to review and update your data life cycles at regular intervals to ensure that they remain accurate and the security controls you apply to them remain appropriate to reduce the risk of data loss, destruction, or exposure.

While you may not choose to formally document your data life cycles, have a run through on a white board at least once a year or on significant business changes to ensure you have a clear understanding and an actionable plan for weaving security through them. Engage with people from around your organization to work through this. The people who interact with the data the most are often the best to help define the data life cycles.

As you carry out this process, you and your team will naturally start to form a language to describe sensitive data, whenever and wherever you find it.

Definition The process of flagging specific types of data and giving it a name, label, or description that communicates its importance is known as information classification.

To get the most out of your data life cycles we need to go a little further into this important subject. With our data and the life cycles identified, it’s time to build an information classification system.

Not All Data Needs to Be Secured

You might be forgiven for dreading this section. Phrases like “information classification system” rarely spark excitement. But stay with me—let’s move past the dry language and dig into what this phrase means and why it matters to our company and its data.

The importance of this section starts with a truth: just because our organization generates, stores, or processes a piece of information, that doesn’t mean it is sensitive or needs securing.

Some of the data we handle, generate, or process poses little to no risk to our organization, no matter what we do with it. Conversely, there are data types that we encounter that can have significant impacts on our organization, our systems, or our customers—if they are mishandled.

Definition An information classification system is, at its core, a way to label the data within your organization according to how sensitive it is and how much impact it would have on your organization if it were to be improperly handled or shared.

By identifying all of the data stored and handled within your context and dividing it into groups in this way, you can start to define processes and policy for how each group of data should be treated. Typically this includes how the information is used, where it is stored, who it is shared with, and how it is shared.

Once we have this structure of policy defined, we can allocate and prioritize our resources to protect the confidentiality, integrity, and availability of our most sensitive data. The more sensitive the data, the more effort and resources we need to keep it safe.

Let’s take a look at a typical information classification system and examples of the data we might expect to find in it.

Table: Common Information Classifications

Classification	Description	Examples
Public	Information is not confidential and can be made public without any implications for your company. Loss of availability due to system downtime is an acceptable risk. Integrity is important but not vital.	• Publicly domain information about the organization • Public marketing materials • Distributed product catalogs
Internal	Information is restricted to management-approved internal access and protected from external access. Unauthorized access could influence your company’s operational effectiveness, cause an important financial loss, provide a significant gain to a competitor, or cause a major drop in customer confidence. Information integrity is vital.	• Software, code and applications developed by your company or on behalf of your company • Operating procedures used in your business • Instructions, training material, guidelines, organization-wide communications
Restricted	Information received from clients in any form for processing in the company or its systems. The original copy of such information must not be changed in any way without written permission from the client. The highest possible levels of integrity, confidentiality, and restricted availability are vital.	• Client account details • Direct communications with clients • Analytics of client transactions
Confidential	Information collected and used by your company in the conduct of its business to employ people, to log and fulfil client orders, and to manage all aspects of corporate finance. Access to this information is very restricted within your company. The highest possible levels of integrity, confidentiality, and restricted availability are vital.	• Salaries and other personnel data • Accounting data and internal financial reports • Confidential customer business data and contracts • NDA’s with clients and vendors • Business plans

Building Your Own Classification System

While these standard definitions will work for a large number of scenarios and provide a quite generic framework for understanding and communicating the sensitivity of your data, there are some cases where you may choose to define your own classification system. This custom classification system provides a way to communicate any data security or handling requirements that are unique to your organization, risk profile, or context.

Reasons you might want a custom classification system:

Coherence and consistency. Organizations that interact with or partner with government organizations, for example, may choose to reflect the classification systems of their more regulated government partners when defining their own system. This helps create a consistent understanding of data security expectations across the two organizations and make communication of risk simpler and coherent.
Culture. Another reason for choosing a custom classification system might be to echo or reflect other cultural patterns in your organization. If you have a strong communication style and language conventions in your organization, then echoing that language style in your security policy and process can help this process connect with the wider company and be easy to understand. Remember that the more relatable and easy to understand our language choices are, the less effort is needed to understand and comprehend its meaning. Easy to understand often means easy to action, and can be a real benefit when trying to roll out a security program.
Increased granularity. Perhaps your organization has more complex or varied data requirements that are challenging to split into the relatively small number of classifications provided by the more traditional classification systems we explained earlier in this section. In these more complex situations you may wish to have more granular requirements or options. Remember, though, that the more complicated your system, the harder it is to implement consistently and check for issues. If choosing this path, be sure to create the “minimum viable classification system” for your needs. A smaller set of requirements and behaviors will be easier to implement, explain, and monitor.

Implementing Your New Classification System

Step 1: Label Your Data

Once you have defined your classification levels, you need to find all data of each type and ensure that it is labeled correctly to communicate its sensitivity.

Figure: The first step in implementing a classification system is to find, classify, and label your data.

Now, let’s not get literal here. Nobody needs your young company to have “top secret” messages at the top of every document, and only Tom Cruise movies need to blow up their documents when they’ve been read.

When we talk about labeling our data, what we really mean is to make it simple for people to understand how sensitive data is when they are interacting with it. There are a range of approaches to labeling your data. You’ll find some of the most common ones in the table below.

Example: Strategies for Labeling Data

Labeling Strategy	Description	Example Use Case
File Names	Include the classification in the name of the file.	myfile_confidential.txt
Folder Names	Sort and store your files according to their sensitivity and label the folder with the appropriate classification level.	myfolder_confidential/file1.txt myfolder_confidential/file2.txt myfolder_confidential/file3.txt
Color Coding	Using color to signify classification in documents or artifacts
Labels and Tagging	Including keywords or tags in file, page or artifact metadata. (In this case, metadata is the data stored about the file such as the size, date created and theme, rather than the contents of the file itself)	Myfile.txt Size: 25 MB Keywords: confidential
Storage Location	Dedicating entire data stores such as databases, shared drives or filestores to a specific classification of data.	C://CompanyPublic D://Company Confidential

Once you have decided on a labeling strategy, there are a couple of important steps you need to take to make sure it sticks:

Make sure everyone understands your new systems. Your team can’t follow your new system if they don’t know why they are doing it or what it means. Keep it simple, communicate it well, and model the behaviors you expect to see. How you communicate this will depend on your team’s style. Whether you should roll out a poster or a specific meme in slack—the key is to catch people’s attention, make it easy to digest, and almost impossible not to follow. Embedding this message when a new person joins the team can also go a long way to making sure people understand your systems from day one.
Automate labeling wherever possible. While not always possible, remember that the system you don’t have to remember to operate is often the most effective. It’s OK to make things easier for yourself and create automations or configurations that make this and other repetitive tasks easier.

Step 2: Define Your Data Handling Guidelines

So you know where all of the data is within your organization and you have labeled it all. Fantastic! You can’t stop now, though—it’s time to define what these labels mean.

When we defined our classification system, we described the sensitivity of the data and how it would impact our company, systems, or customers if it were mishandled, but that isn’t the whole picture.

To move from conscious understanding of the risk posed by our data to protecting it from harm, we need to look at the steps we will take to reduce this risk. In the case of data classification, this is the creation of our data handling guidelines.

Data handling guidelines explicitly state the ways in which data can and cannot be treated. When deciding your guidelines, consider these elements:

Storage: Where will you keep your data, and for how long?
Access: Who can touch the data, and what can they do with it?
Purpose: Why do you have this data?
Sharing: Who outside of your organization might need access to the data, and how will you share it?
Transport: How will you move data around, whether internally or externally?
Destruction: When you don’t need this data anymore, how will you dispose of it?

When implementing your data classification system, it is important to articulate any specific requirements or policy governing these data handling elements, and then implement processes and technologies that turn these policies into repeatable actions from your team and systems.

As information classification systems exist to focus our attention on that data that poses the most risk, it therefore follows that the higher the classification, the higher the risk, and the more policy and process will be needed to keep the data safe and secure.

Example: Data Handling Policy per Classification Level

Classification	Storage	Access	Purpose	Sharing	Transport	Destruction
Public Example: Public marketing materials	Can be stored anywhere on the company filestore for as long as they are needed.	All team members can access. Marketing team members can edit.	To support the sales process and customer success.	Can be shared externally with all potential customers, investors or interested parties.	Can be shared by email or online via website.	No special requirements.
Internal Example: Internal operating procedures	Can be stored in the company shared drive for as long as they are needed.	All team members can access and most can suggest edits. Documents have an owner.	To share how common business actions are carried out.	Can be shared between all team members. Can only be shared externally with prior permission.	Can be shared by email.	No special requirements.
Restricted Example: Customer analytics and metrics	Can be stored in a specific directory for the period they are needed or as per customer contract requirements.	Only team members with a specific business need may access. Specific permissions are needed to update or edit.	To understand customer satisfaction, company performance or behaviors.	Can be shared with other authorized team members. Not to be shared externally.	Cannot be transported without prior approval and the removal of identifying or sensitive information. Must be sent over an encrypted connection.	Data must be destroyed in line with customer requirements and contracts.
Confidential Example: Customer data	Can be stored only in specific customer data environment which is defined by customer contract and operating policy.	Only limited team members with authorization may access for a limited time period. Data must not be edited or changed without written permission and creation of suitable audit logs.	To serve the customers needs and supply your product or service.	Access requires written approval from the system owner. Data cannot be shared without explicit customer consent.	Cannot be transported without prior approval from the customer and adherence to their data handling requirements. Must be sent over an encrypted connection.	Data must be destroyed in line with customer requirements and contracts.

Now that we have defined our expectations for each classification level in the form of our policy, we need to turn these expectations into actions, processes, and procedures that can be followed, implemented, or automated consistently across the business.

important The further from policy towards technology we move with our implementation, the more you may need to get assistance from your technology service provider or technical team members. Your role is to define the policy and expectation, and then find the right people to make your policy into an actionable reality.

Let’s take a look at one of the classification levels above as an example and a possible data handling policy:

Classification: Restricted
Example: Customer analytics and metrics
Purpose: To understand customer satisfaction, company performance, or behaviors

Example: A Data Handling Policy for Restricted Data

	Policy	Process	Technology
Storage	Can be stored in a specific directory for the period they are needed or as per customer contract requirements.	Communicate with your team and tell them where this data type should be stored. Provide a simple mechanism for getting help if unsure about data storage or if they spot data that seems mishandled.	Create a specific folder for the storage of analytics and metrics. If choosing an analytics platform or online tool, review their data storage policy, configuration options and retention settings.
Access	Only team members with a specific business need may access. Specific permissions are needed to update or edit.	Deny access by default—if a team member needs access they must request it and that request must be recorded (for example in a service ticket, email or other document). Review access annually or on significant change to business or team.	Restrict access to only those team members with prior approval. Consider group based permissions to simplify and improve consistency. For online tools, choose a tool that implements single sign on or has a fine grain permissions scheme that can be used to meet these requirements.
Sharing	Can be shared with other authorized team members. Not to be shared externally.	Educate your team on how to safely share data and why it’s important that some data is not shared. Review customer contracts to ensure they have explicit definitions around data sharing, storage and retention. Review sharing regularly to ensure data is only shared for the period it needs to be shared.	Restrict data sharing to prevent sharing outside of the team, names of individuals or organizations—this could be in tool configuration or as part of your identity and access management system.
Transport	Cannot be transported without prior approval and the removal of identifying or sensitive information. Must be sent over an encrypted connection.	Educate your team on how to send data of this kind safely. For engineering teams, define a standard or design pattern for data transport security.	Configure all tools and systems to use an encrypted connection such as HTTPS or SFTP. Restrict outbound and inbound traffic to company systems or networks to only allow authorized connections to specified systems. Implement monitoring and alerting to identify unusual data transportation, sharing, or connectivity patterns.
Destruction	Data must be destroyed in line with customer requirements and contracts.	Define a process where team members can request the destruction of data and this request is recorded.	Use a secure data destruction tool to ensure disks are sufficiently clean when decommissioned.

Respect the Classifications of Others

Many organizations share data with, or collect data from, customers or partners. These relationships require balance. Each party will have their own understanding of the value and sensitivity of the data, and their own expectations of how it should be handled throughout its life.

While you may have defined a classification system for information when it is stored and processed within your organization, you also need to understand the expected classification and requirements imposed on you by your partners or customers.

In some cases, particularly when dealing with larger organizations or government departments, these inherited or expected classification systems or expectations may be much more formal than your own.

To make these relationships successful (and safe):

Communicate clearly with your partners when establishing this relationship to understand these requirements and translate them into a language and approach that works within your organization.
Ensure in return that any differences in your systems or approaches are communicated to your partners so that they can understand and assess any risk that arises from this situation.

With classification settled, let’s look at how to ensure security is a key part of your software development process.

Building Security into Your Software Development Process22 minutes, 11 links

🚀 As explained by Laura

While almost all organizations rely on software in some way or another to get the job done, not all organizations need to build their own software. If you do, though, this chapter is for you.

This includes companies that sell their software (perhaps as a SaaS model) or those companies that need to use custom software internally but do not sell this software as part of their product or service offering (for example, a service organization that has built custom software to manage their scheduling, workflows, and billing). Some organizations build their own software, while others pay external companies to build software on their behalf. Whichever approach your company has taken, security needs to be front of mind throughout the process.

You’re reading a preview of an online book. Buy it now for lifetime access to expert knowledge, including future updates.

If you found this post worthwhile, please share!

Related sections

Step 3: Be Deliberate With Privacy Settings

Step 1: Set a Strong Password Policy

Step 6: Manage Remote Access Securely

Minimum Viable Security

How to Create an Incident Response Plan