Strava’s heatmap revealed military bases, but it also showed nothing is anonymous online
Around Melbourne’s Albert Park, Strava’s heatmap glows gold with the efforts of cyclers and runners. But outside the city, there are fainter, more solitary lines. Could they trace a lone jogger’s route to her front door?
The popular exercise tracking app recently made headlines when a world map of its users’ fitness routines was found to reveal secretive military locations and troop movements.
In 2013, for example, researchers showed thatfour known locations at a given point in time can be enough to identify 95 percent of individuals in a dataset.
After the potential risks lit up the internet, Strava said its heatmap contained only an “aggregated and anonymised” view of its data.
But can I find you in “aggregated and anonymised” data? Probably, if I know how to look.
It’s all about context
As Strava’s heatmap shows, data are “anonymous” only until they are linked up with other contextual information. Say, a known military zone.
If you zoom in on an otherwise dark area in the Middle East, explained John Scott-Railton, a senior researcher at the University of Toronto’s The Citizen Lab, you might find a forward operating base.
Narrow in, and you may see tracks made by soldiers on patrol.
Or you could locate embassies and look for popular paths between those buildings and diplomatic residences, as he outlined on his blog.
This impact isn’t necessarily limited to military personnel.
Say you find a solitary heatmap line in a rural or remote area. Although Strava’s satellite imagery is relatively low quality, you could look up a more detailed image on Google Maps.
“You can see exactly where the houses are. You can figure out what street number it is,” explained Vanessa Teague, a senior lecturer at Melbourne University’s School of Computing and Information Systems.
This could be significant for those who have good reason to keep their location secret.
Anonymous doesn’t always mean anonymous
Most online services collect data about you — your name, your computer’s IP address and much more.
Some companies sell this data to advertisers, but they are careful to tell you they take out “personally identifying information”, otherwise known as PII.
This could be your street address or date of birth, but removing PII is not always enough.
Not when data can be linked with other information about you, whether it is a holiday you shared on Facebook or the LinkedIn data breach of 2012.
In late 2017, Dr Teague and her team reported that they could identify famous individuals within an “anonymised” Medicare dataset, simply by comparing it with news of sporting injuries or pregnancies. The dataset has been taken offline.
Within the Medicare dataset, there was a large cohort of each gender as well as state and year of birth.
But being in a crowd doesn’t always protect you. To illustrate, Dr Teague looked for herself in the data. As was reported at the time:
More than 17,000 women in the dataset matched her year of birth, but when the years of birth of two of her children were added, only 59 possible matches remained. And only 23 in her home state of Victoria. Adding their specific days of birth brought the possible matches to zero.
There was no explicit PII in Strava’s heatmap, but Dr Teague said the promise of anonymity in these kinds of datasets was a “furphy”.
People have been able to identify locations, patterns of life and even individuals.
“A series of data points about where you were or what medical issues you had or who you communicated with or what web searches you did — all of this kind of data could allow you to be reidentified,” she said.
Not a personal issue but a public issue
As Mr Scott-Railton pointed out, privacy policies and privacy approaches are designed mostly to minimise company risk and liability rather than protect users.
Casey Ellis, founder of the security company Bugcrowd, suggested those who deal in consumer data may need to think in a more adversarial manner.
“When people design software, they build it around use cases,” he explained. “What they don’t consider is the abuse cases.”
Consider Strava’s heatmap.
While running aggregates in busy areas like Melbourne may not do much harm, Strava could have used a higher threshold for hiding smaller aggregates to protect both military personnel and civilians in remote locations.
Ultimately, however, we may need regulators to step in — our data points are just too sensitive and too important.
As Zeynep Tufekci, an associate professor at the School of Information and Library Science at the University of North Carolina, wrote in response to the Strava debate:
“Data privacy is not like a consumer good, where you click “I accept” and all is well. Data privacy is more like air quality or safe drinking water, a public good that cannot be effectively regulated by trusting in the wisdom of millions of individual choices. A more collective response is needed.”
In the meantime, know that when companies tell you data are “anonymous and aggregated,” the story is incomplete.