Foundations of Data Management Blog Series
Unlocking insights from data is the key to business transformation and success in the digital age. In this new series of blog posts, Taos Cloud Engineers James Leone and John Evans deep-dive into the pillars of a strong data management infrastructure and how Taos approaches building this with our clients.
Blog #1: The Four Pillars of the Data Landing Zone
Data is the New Gold
The power of data and its value to the digital economy has been top of mind for the last decade. Organizations that leverage available data to its fullest potential gain value from insight to innovation, while organizations that mismanage their data can experience declining market share and stagnation. The recognition of the business value of this asset and the opportunity costs associated with not treating it as a rich resource to be mined has exponentially increased the importance of data management: what data do you store, how do you store it, who can access it, and what do you do with it?
Strong data management is vital for maintaining access controls to data both inside and outside of an organization, and it prevents overzealous data retention, which can bring sky-high storage costs and legal liabilities. It’s also the source of producing meaningful results from data. Without strong management, you might get results that are questionable or inaccurate, out of date and therefore don’t matter, or don’t get to the people who need them. Strong end to end management of the quality of data ensures that the results produced are correct, fresh, and accessible, all of which unlocks the value of the data for the business.
Data in the Cloud
The industry pivot to cloud platforms has changed data management, because on premises data management is fundamentally different from data management in the cloud. On prem data is completely under your control, from physical access to managing request patterns. Cloud data, however, is managed in a shared responsibility model: the cloud provider is expected to ensure that data is encrypted and stored securely, but you define your user and consumer access to data. The ball is in your court to ensure that your data is both managed correctly and fit for use; many organizations have been burned by going skipping over these key points. The pitfalls of mismanaging data and data access are legion. Some examples:
- External data breaches from storing data incorrectly, such as in misconfigured storage containers or in an object store like AWS S3 that gets left open to the public
- Internal data breaches from ill-defined or non-existent access models, because before you give anyone access to cloud data, you have to define how to grant, monitor, audit, and remove access
- Shocking cloud storage bills from terabytes of redundant or poorly structured data
- Lost credibility and opportunities based on faulty results and misinformation mined from poor quality data
The Data Lifecycle
So how do you avoid these pitfalls? First, consider the data lifecycle. Well-managed data is ingested, stored according to usefulness and/or regulations, shared to a minimum set of consumers, and then destroyed.
Ingestion: Data has to be ingested according to structure with the right resources and processed for storage and access.
Storage: Storage for later access is vital, but data in the cloud needs to be stored by access level to avoid wasted cloud spend and security risks. Data should be sorted into hot for regularly accessed data, warm for often accessed data, cold for rarely accessed data, and archive for compliance related retention.
Sharing: Once data of value has been stored, it is vital to identify a sharing model that provides a minimum level of access to its consumers. Applying the Principle of Least Privilege helps to ensure that data isn’t overshared and subsequently leaked. Remember, the weakest link in any organization’s security posture is its users.
Destruction: Destroy outdated data securely and at the right time, and ideally this is codified before data is stored.
The benefits of a well-managed data lifecycle make for a very a long list, but here are some key points:
- Data quality is maintained for consistent, clean, and accurate results
- Data is delivered in the shape and form that’s consumable by those who want to gain insights
- Cost control is achieved by eliminating junk and not paying for excess storage
- Access controls and appropriate security levels are maintained to ensure privacy and data safety
- Change control, change management, and audit history are possible, preserving the chain of custody and data records
- Data sovereignty and residency requirements are met while also allowing your policies to evolve with laws and regulations that might be enacted or changed in the future
A system for making every step in the data lifecycle happen correctly is DataOps. This involves groups of individuals from many teams working with data scientists, engineers, and analysts to manage ongoing ingestion, scrubbing, and destruction, using the same statistical improvement model as assembly lines. According to DataKitchen, “you can view DataOps in the context of a century-long evolution of ideas that improve how people manage complex systems. It started with pioneers like W. Edwards Deming and statistical process control – gradually these ideas crossed into the technology space in the form of Agile, DevOps and now, DataOps.” (1)
To have effective data lifecycle management and the statistical improvement benefits of DataOps, you need a strong foundation, and this all starts with the data landing zone. You have to build before you can run.
What Is a Data Landing Zone?
A data landing zone is the framework built on your cloud platform that forms the core infrastructure of your data management system, including access policies, networking, data transfer, storage systems, and more. This specialized landing zone is a combination of technical and business process controls built on four distinct pillars, and – when constructed well – supports your overall business policies and processes, enabling leadership to maintain data oversight and confidence about what is retained and available for use.
What Are the Four Pillars?
The four pillars are key considerations and steps that support the design, implementation, and management of your data landing zone (and ultimately your data). Taking the time to get these right will maximize the benefit you get from your data.
The Four Pillars of the Data Landing Zone
|Data Ingestion||Determine the source, velocity, volume, variety of your data and the best way to manage intake|
|Data Governance||Define the purpose and intent for the data being used, applicable regulations, and security controls to restrict access|
|Data Transformation||Process incoming data from existing structures into forms that can provide focused analysis and useful comparisons|
|Data Consumption||Use processed data to unlock insights and inform business decisions through direct analysis, AI/ML models, and more|
The Foundations of Data Management Blog Series will explore each of the four pillars of the data landing zone in detail and then finally what you can do with the power of your unlocked data. Next up is data ingestion: what it is, what it isn’t, and how to design your landing zone with the right resources to process your data.