Finding a needle in a haystack. Testing the Big data

What if you had to deal with data warehouse containing trillions of records? When only scanning of a single table takes a few days, and amount of data sources grows, and every day they produce over 3 billions of new records? How do you assure the sanity of your data, when making a few releases per day? How do you tell if they are still correct? The same way as spacecrafts explore the entire planets having limited resources! We were capable to find a series of approaches, that have already proven their efficiency. In my presentation I will tell how we assure data consistency and how we learn to understand our data.

Audience level
Regular Talk (40 min)

Comments

{{comment.AuthorInfo}}
{{ comment.DateCreated | date: 'dd.MM.yyyy' }}
Found a mistake?