Case Proposal: The service collects data from your voice when you communicate with a service verbally

lxda · May 3, 2021, 9:12pm

I propose the following data to be a new case:

Fields	Data
Name	The service collects data from your voice when you communicate with a service verbally
Description	The service collects data and recordings of your voice when you communicate with a service verbally and may use them to improve their services or used for commercial purposes.
Classification	blocker
Topic	Topic Types of Information Collected (ToS;DR Phoenix)
Weight	50

Agnes_de_Lion · May 4, 2021, 6:54am

Should it be separated from Case 397: Your biometric data is collected?

There aren’t many points linked to Case 397, so I think creating a whole new case for this kind of biometric collection will overwhelm the cases’ list and make it harder for reviewers to find the right case to link a quote to.

lxda · May 4, 2021, 7:37am

For me, biometric data is more like fingerprint and facial recognition.

My argument is mainly that big companies make voice assistants available and collect voice data from their users.

Especially for siri, bixby, alexa, google assistant, etc.

And I think it’s important to make the difference between biometric data and voice data which can sometimes be very sensitive.

What do you think about it?

justin · May 4, 2021, 7:41am

Agree that voice is not biometric data, didn’t we also want to move away from the phrase “the service”?

lxda · May 4, 2021, 7:45am

For me, whether it is “this service” or “the service” doesn’t change much as I try to change the titles as much as possible by putting the name of the service instead.

justin · May 4, 2021, 7:47am

The discussion was about moving away from that phrasing to make it cleaner and easier to read.

“Data about your voice is collected” - Not is it only shorter, it’s much easier to read.

Note: The case names appear in the extension so we don’t want to clutter it.

lxda · May 4, 2021, 5:28pm

Okey,

I agree with “Data about your voice is collected” is really much easier to read.

lxda · May 5, 2021, 8:41pm

I am raising the subject again with regard to this case.

Do we add it?

Agnes_de_Lion · May 5, 2021, 8:48pm

Maybe we should wait a few days for other opinions to be expressed.

Arlo · May 5, 2021, 8:55pm

Voice prints are totally biometric data. However, I get the sense the proposal isn’t asking about [re]identification, but rather about ‘ownership’ of the voice data? Some companies have been caught generating TTS systems using consumer data from collection occurrences they didn’t notify people of, for instance.

lxda · May 5, 2021, 9:14pm

Voice prints are totally biometric data. However, I get the sense the proposal isn’t asking about [re]identification,…

Mmmhhh…

Yes, I think that in order to keep a basis for this proposal, we could change the title to
“Your voice data can be used for many purpose”

lxda · May 8, 2021, 10:48am

To reopen the subject, I found this in the Duolingo privacy policy:

“To recognize speech your audio may be sent to a third party provider such as Google, Apple, or Amazon Web Services.”

justin · May 8, 2021, 11:01am

Isn’t the main issue that its the service’s business model? Assigning a blocker to a language learning service that collects voice data to function doesn’t seem right to me.

lxda · May 8, 2021, 11:05am

No, but for me, the user must know that when he speaks, his voice data are processed by google, amazon or apple.

justin · May 8, 2021, 11:07am

You forgot the important bit in the privacy policy though:

We may ask you to allow Duolingo to collect and analyze your speech data to help us understand the effectiveness of our lessons, and to improve the product.

This is what qualifies it as a blocker. Its all about the context

justin · May 8, 2021, 11:09am

To add to this, we have case-394 which states “that it makes sense for the service” and is neutral.

We have to keep that in mind as well. Thats why I think case-400 is wrong too as Google Maps or any geolocation provider would have a worse grade just because they exist

Agnes_de_Lion · May 8, 2021, 12:28pm

Case 400 is supposed to be only assigned to services that don’t rely on geo-location, according to its description:

Unless the service relies on Geo Location, this case is to be assigned to points that don’t need your GPS coordinates to function properly.

lxda · May 8, 2021, 1:40pm

or we can change the name by " Data from the voice are processed by Third-part" and no blocker but bad .

PS: Duolingo is similar to Quizlet and the speaking part is not at all necessary for learning. So in my opinion, it is important to warn the user that their voice is being processed by web giants known for their unethical practices.

Agnes_de_Lion · May 14, 2021, 12:34pm

To summarize, this would be the new case proposal:

Fields	Data
Name	Voice data is collected and shared with third-parties
Description	The service collects data and recordings of your voice when you communicate with a service verbally and may use and share them with third parties, that may use this data for marketing or advertising.
Classification	bad
Topic	Topic Types of Information Collected (ToS;DR Phoenix)
Weight	50

I’ve updated the description to match with the new title

If there are no other thoughts on this, I’ll add the case!
Edit: The case is now in Phoenix: Case 489: Voice data is collected and shared with third-parties

justin · May 14, 2021, 12:35pm

Yep that sounds great!