Why You Need a Data Flow Diagram—and How to Create One
- Updated: Dec. 31, 2024
- Read Time: 5 mins
Do you know every path your data takes—and every risk it encounters along the way? Knowing where your data starts, where it goes, and who—and what—interacts with it, you gain the power to secure it, optimize your operations, and make smarter decisions.
A data flow diagram (DFD) maps your data’s journey, revealing hidden vulnerabilities, inefficiencies, and opportunities to strengthen your systems.
What’s even better, a DFD’s value isn’t just in having one, the process of creating a diagram will almost certainly reveal new things about how your data ecosystem works.
What Can a Data Flow Diagram Tell You?
A data flow diagram shows exactly how and where your data moves, so you can make sure every step is secure. Typically, a DFD will reveal things like:
- Hidden Risks: You may discover that sensitive data is duplicated or transferred across multiple systems, increasing your exposure to breaches.
- System Complexity: For organizations with many processes—aka most organizations—DFDs simplify understanding and managing complex data flows. We’ve worked with some organizations that require several DFDs to accurately show the entirety of what’s happening.
- False or Misleading Assumptions: Relying on verbal or written explanations of your system leaves room for ambiguity. A DFD eliminates guesswork.
Don’t think you’re covered just because someone on the team can verbally explain a system. They’re most likely leaving things out, and that person will not be around forever. Additionally, you shouldn’t count on a typed explanation of a system. A written description almost certainly includes enough ambiguity to let problems hide in the fine print.
Diagram Beyond Your Walls: Third Parties
Your data doesn’t stop at your organization’s perimeter—and neither should your DFD. Extending your mapping efforts to include third-party systems can reveal overlooked vulnerabilities, such as weak data transfer protocols or excessive access permissions.
At HBS, we often create individual DFDs for each third-party system connecting to our client’s organization. This clearly shows how data flows in and out of the third-party’s environment and where it may need additional security.
» Real-World Example: While mapping your payroll system, you might discover your vendor sends data over unencrypted channels—a glaring security risk that could compromise employee information.
How to Create a Data Flow Diagram
Building a DFD involves three core steps, understanding that the details will vary based on your overall cybersecurity maturity. Here’s how we do it at HBS:
Leadership—guided by cybersecurity consultants—reviews the potential risk factors facing their business. They identify systems already in place, cybersecurity protocols, corporate governance, connected vendors, and key business processes.
As you start building your DFD, be sure to look over the results of your most recent
risk assessment for information on what you should map. This helps leadership identify risks, plan mitigating controls, and evaluate whether the remaining risk is acceptable.
2. Asset Inventory
Document hardware, software, and data used throughout your business. That includes both at-rest and in-transit systems.
- Hardware – The asset list should include any physical asset that may store or come in contact with an organization’s data. That includes computers, mobile devices, network equipment, printers, scanners, and more.
- Software – Every organization has a set of approved applications that typically includes financial applications, fixed contract applications, and licensed applications. The software inventory should include documentation of the software’s end of life.
- Data – Include a list of data types stored, how it is stored, and who in the organization owns it.
3. Develop the Diagram
With all the information gathered, you can start mapping out your DFD. We recommend starting with a Level 0 DFD—a high-level overview—then create Level 1 diagrams for deeper process insights.
Universal Symbols of a DFD
A DFD uses a universally accepted set of symbols to portray information flow within and between network segments as well as through the institution's perimeter to external parties.
Your DFDs should identify:
- Data sets and subsets shared between systems.
- Applications sharing data.
- Classification of data (public, internal, confidential, restricted) being transmitted.
- How data is identified at rest and in transit.
There are several tools available to create a data flow diagram. Options like Microsoft Visio, Microsoft Whiteboard, and Canva all have ways to design simple and clear diagrams. Here’s a how to article from Microsoft on creating a DFD with Visio.
Who Is Using Your Data—and How?
While you create your DFDs, you’ll want to understand how two types of users touch your data.
» Pro Tip: Reducing unnecessary permissions is one of the simplest ways to lower the risk of breaches.
Vendors – Working with vendors inevitably poses a threat by allowing outside access to internal processes. Consider these factors about vendor access:
- Identification: Create a profile for each vendor, including name, address, key contacts, services provided, contract details, and expenses.
- Grouping: Categorize vendors to identify those considered “critical.”
- Level of Risk: Evaluate if the vendor has access to information that could significantly impact the business.
Now that you know who is using your data, pivot to understanding how your data is being used. Identify whether employees, vendors, or systems are writing, modifying, storing, or processing the data. This clarity will help you protect sensitive information and accurately map your data’s journey.
Additionally, knowing the location of data interactions helps assess risks. For example:
- Lower Risk: Data moving between internal systems in the same building.
- Medium Risk: Data being transferred between two offices within the same country but across different networks or through an external cloud provider.
- Higher Risk: Data transfers between countries, particularly to regions with weaker data protection laws.