Book description
For most companies, quality data is key to measuring success and planning for business goals. But achieving data accuracy and integrity can be a daunting task given the messy nature of data in the wild. How can you trust that source data is accurate? What data should be excluded as invalid? What steps can you take to ensure that all the data is transformed correctly? How do you know if your conclusions are accurate?
This report presents a case study from a large and critical data project at Spiceworks, the vibrant network, online community, and marketplace for IT professionals. Author Jessica Roper, a senior developer in Spiceworks’ data analytics division, demonstrates ways to think about data verification, processing, analysis, and automation. You’ll also get a guide to tools for determining whether the data you collect and use is reliable and accurate.
- Understand what’s involved in vetting data for trustworthiness
- Learn strategies and test cases for verifying raw data sources and working with transformations
- Become familiar with the data at each layer and create tests between each transformation to ensure consistency
- Understand which edge cases to look for, and what trends and outliers to expect
- Depend on data monitors to identify anomalies and system issues
- Automate process and acceptance tests to monitor and ensure reliability
- Work with other teams and groups to improve and validate data accuracy
- Increase adoption by using data to measure success
Publisher resources
Table of contents
-
A Guide to Improving Data Integrity and Adoption
- Validating Data Integrity as an Integral Part of Business
- Using the Case Study as a Guide
- An Overview of the Usage Data Project
- Getting Started with Data
- Managing Layers of Data
- Performing Additional Transformation and Formatting
- Starting with Smaller Datasets
- Determining Acceptable Error Rates
- Creating Work Groups
- Reassessing the Value of Data Over Time
- Checking the System for Internal Consistency
- Verifying Accuracy of Transformations and Aggregation Reports
- Allowing for Tests to Evolve
- Implementing Automation
- Conclusion
- Further Reading
Product information
- Title: A Guide to Improving Data Integrity and Adoption
- Author(s):
- Release date: December 2016
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491970515
You might also like
book
Information Governance Principles and Practices for a Big Data Landscape
This IBM® Redbooks® publication describes how the IBM Big Data Platform provides the integrated capabilities that …
book
EU General Data Protection Regulation (GDPR) – An implementation and compliance guide, fourth edition
This bestselling guide is the ideal companion for anyone carrying out a GDPR (General Data Protection …
book
Measuring and Managing Information Risk
Using the factor analysis of information risk (FAIR) methodology developed over ten years and adopted by …
audiobook
Financial Intelligence
Since its release in 2006, Financial Intelligence has become a favorite among managers who need a …