[db-wg] Removing personal data from bulk output from the RIPE Database
- Previous message (by thread): [db-wg] Removing personal data from bulk output from the RIPE Database
- Next message (by thread): [db-wg] Removing personal data from bulk output from the RIPE Database
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Peter Koch
pk at DENIC.DE
Wed May 8 14:51:57 CEST 2013
Denis, > We?ve received feedback from different users and researchers that we are > overdoing the dummification. For example, one can obtain all references > to personal objects without hitting any personal object result limits by > querying the live RIPE Database with proper flags (like -r). This makes > the "dummification" of these references in the data dumps meaningless. I do not buy this argument. We know that certain access restrictions can be circumvented eventually by renting the ultimate botnet and do a mass harvest. That doesn't render restrictions useless. One could argue that if certain access controls were implemented to achieve a certain goal and other methods open a path around these controls, those other methods (the -r flag in this case) ought to be reviewed instead. > In order to improve the usability of the data dumps and streams, we are > proposing to change the "dummification" algorithm to keep the actual > personal objects and all references to them and only obfuscate the > fields with personal data (for example real names, phone numbers and > addresses). The new algorithm will also try to preserve data that is > useful for researchers, while not revealing any data that might expose > the identity of the date subject. For example, we are proposing to keep > the first half of phone number digits or to keep the domain part of > email addresses. I am missing a list of data protection goals that were desired to be met by the original implementation and a serious assessment why they would still be met by the proposed changed method. I doubt that obfuscating the local part of an email address is an adequate measure of anonymization or pseudonymization. Similar concerns hold for phone numbers. On a meta level mangled data is a threat to real data more than replaced data is. FWIW, i don't see the special case for 'abuse-mailbox'. With optimizing the 'dummification algorithm' around fuzzy criteria it occurs to me we're putting the cart before the horse. -Peter
- Previous message (by thread): [db-wg] Removing personal data from bulk output from the RIPE Database
- Next message (by thread): [db-wg] Removing personal data from bulk output from the RIPE Database
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
[ db-wg Archives ]