Reddit Reddit reviews Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World

We found 4 Reddit comments about Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World. Here are the top ones, ranked by their Reddit score.

Computers & Technology
Books
Computer Security & Encryption
Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World
W W Norton Company
Check price on Amazon

4 Reddit comments about Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World:

u/OhTheHugeManatee · 23 pointsr/explainlikeimfive

Worthwhile sidebar: "anonymized" data is almost never actually anonymous. Sorry for the extensive quote, but it's really relevant here. From Bruce Schneier's excellent book, Data and Goliath:

> "Most techniques for anonymizing data don't work, and the data can be de-anonymized with surprisingly little information.

> "In 2006, AOL released three months of search data for 657,000 users: 20 million searches in all. The idea was that it would be useful for researchers; to protect people's identity, they replaced names with numbers. So, for example, Bruce Schneier might be 608429. They were surprised when researchers were able to attach names to numbers by correlating different items in individuals' search history.

> "In 2008, Netflix published 10 million movie rankings by 500,000 anonymized customers, as part of a challenge for people to come up with better recommendation systems than the one the company was using at that time. Researchers were able to de-anonymize people by comparing rankings and time stamps with public rankings and time stamps in the Internet Movie Database.

> "These might seem like special cases, but correlation opportunities pop up more frequently than you might think. Someone with access to an anonymous data set of telephone records, for example, might partially de-anonymize it by correlating it with a catalog merchant's telephone order database. Or Amazon's online book reviews could be the key to partially de-anonymizing a database of credit card purchase details.

> "Using public anonymous data from the 1990 census, computer scientist Latanya Sweeney found that 87% of the population in the United States, 216 million of 248 million people, could likely be uniquely identified by their five-digit ZIP code combined with their gender and date of birth. For about half, just a city, town, or municipality name was sufficient. Other researchers reported similar results using 2000 census data.

> "Google, with its database of users' Internet searches, could de-anonymize a public database of Internet purchases, or zero in on searches of medical terms to de-anonymize a public health database. Merchants who maintain detailed customer and purchase information could use their data to partially de-anonymize any large search engine's search data. A data broker holding databases of several companies might be able to de-anonymize most of the records in those databases.

> "Researchers have been able to identify people from their anonymous DNA by comparing the data with information from genealogy sites and other sources. Even something like Alfred Kinsey's sex research data from the 1930s and 1940s isn't safe. Kinsey took great pains to preserve the anonymity of his subjects, but in 2013, researcher Raquel Hill was able to identify 97% of them.

> "It's counterintuitive, but it takes less data to uniquely identify us than we think. Even though we're all pretty typical, we're nonetheless distinctive. It turns out that if you eliminate the top 100 movies everyone watches, our movie-watching habits are all pretty individual. This is also true for our book-reading habits, our Internet-shopping habits, our telephone habits, and our web-searching habits. We can be uniquely identified by our relationships. It's quite obvious that you can be uniquely identified by your location data. With 24/7 location data from y our cell phone, your name can be uncovered without too much trouble. You don't even need all that data; 95% of Americans can be identified by name from just four time/date/location points.

> "The obvious countermeasures for this are, sadly, inadequate. Companies have anonymied data sets by removing some of the data, changing the time stamps, or inserting deliberate errors into the unique ID numbers they replaced names with. It turns out, though, that these sorts of tweaks only make de-anonymization slightly harder.

> "This is why regulation based on the concept of 'personally identifying information' doesn't work. PII is usually defined as a name, unique account number, and so on, and special rules apply to it. But PII is also about the amount of data; the more information someone has about you, even anonymous information, the easier it is for her to identify you."

So I would remove the first part of your explanation, and just go with "it's basically making what they are already doing/have been doing for who knows how long legal." It gives the government explicit permission to collect all your Internet activity and searches.

u/oiwot · 1 pointr/IAmA

Well said. I strongly encourage anyone even vaguely interested to read Bruce Schneier's latest book Data & Goliath which explores this.

u/EuanB · 1 pointr/australia

I know

The point is, you were called on a bad example. Instead of graciously accepting bad example you went 'wah wah wah there are other ways.' You're not wrong but failing to acknowledge valid criticisim of your point is poor form.