Describing the Privacy of Complex Things is Complex… so is testing black box behavior of same, both could do better.

Recently the Mississippi Attorney General sued Google, revisiting some of the same claims that the EFF made in late 2015, alleging that Google is mining student data in violation of agreements and the student privacy pledge.

The title of this post is my TL|DR summary of an excellent post by Bill Fitzgerald, the Privacy Initiative Director at Common Sense Media. It raises  an important point, which is that it is important that the vendors that provide EDTech services be accurate, transparent and comprehensible, about what is happening with use data, it is equally important to hold those that criticise, advocate, lobby, and enforce privacy to similar standards.

Based on the information currently available, the Mississippi AG lawsuit does not appear to meet this standard.

1.The lawsuit lacks specific evidence of any actual evidence of data mining. This was pointed out in the ED Week article about this by Benjamin Herold , where he says

“The Mississippi attorney general’s office, meanwhile, has provided only limited information about how it determined that Google is tracking students, using their data to build profiles, and targeting them with ads. Officials “tested” …[but] declined to provide any details about the nature of those tests, citing their ongoing investigation. The lawsuit itself contains no information demonstrating that any of Google’s allegedly deceptive practices actually occur.” 

This is also born out in the FAQ which says:

Q: What information is Google collecting?
A: It is unclear at this time exactly what information Google is collecting from its GSFE users. Through this lawsuit, the Attorney General seeks to uncover exactly what information Google is accessing and collecting. The lawsuit also seeks information as to how Google is using that data.

2.The allegations about Chrome Sync are both technically incorrect and refers to functionality (sync passwords, browser history, bookmarks etc.) that is similar to functionality that exists in nearly every modern browser/operating system. For reference see both Google’s response to Sen. Franken and the Chrome help page for adding a “trust no one” passphrase that prevents Google from reading sync data (see ) the descriptions of what does not work if this is done make it very clear what it is used for.

3. The references to non-core services ignore the clear statements that Google makes to schools (in the terms and in the admin console) that schools are responsible for obtaining parental approval for all users under 18 prior to enabling a non-core service . One question that has not been answered in Mississippi is if the student accounts the AG used had YouTube enabled for students, and if so, did the school obtain the parental permission.

4.On the claim that the Google policies are complex and in places contradictory. I’d point folks to the EDU privacy notice https://gsuite.google.com/terms/education_privacy.html  which is a short (<1200 word), easy to read document that summarizes the policies provides answers to that would lead one to believe the Miss. lawsuit got the facts wrong and very clearly addresses the concern about multiple conflicting polices by saying…

“Where there are terms that differ, as with the limitations on advertising in G Suite for Education, the G Suite for Education agreement (as amended) takes precedence, followed by this Privacy Notice and then the Google Privacy Policy.”

As far as them being complex, yes, that is a fair point, because it is a complex system and yes there are areas for improvement, but one very clear area I’d point to is where the word “privacy” links to depending on consumer or GSuite accounts.

privacy-compare

5.In a video clip of an interview with journalist Anna Wolfe, Hood make the claim that his office looked at “some other class action lawsuits that Google settled where they were in fact mining data of children”. No details were provided, but I cannot identify what “Class action settlements” he was referring to. The most likely one (Matera v. Google) appears to have been modified so that it does not include Google Apps for Education. The settlement document says

“Subsequently, on October 17, 2016, Plaintiff Matera filed an Amended Complaint (ECF No. 58), …… eliminating allegations pertaining to Google Apps”

6. As long as we are on the subject of court settlements and prior bad acts, it is worth remembering that a federal court shut down AG Hood’s abuse of authority in a prior case against Google after a series of Pulitzer prize winning articles on how the influence of lobbyists can sway congressional leaders and state attorneys general.

Some privacy and transparency areas that Google could improve on include:

  1. Disabling all non-core Google services by default for newly created GSuite for Education domains.
  2. Specifically clarifying what takes precedence for schools the ADmin notice that it is the schools responsibility to get permission from parents for students under 18 (and therefore under 13) to use services such as YouTube, Google + etc..) or the terms corresponding language that prohibits the use by under 13 in these services 9e.g. YouTube, Google + and the Google Chrome Store).
  3. Requiring developers to post links to terms and privacy policies in their listing in the Chrome Apps Store, and conspicuously displaying the link.
    • Require the same for Apps found apps discovered through Google Drive’s “connect more apps” feature.
    • Require the same for 3rd Party “google add-ons” for sheets, docs and forms. This last is particularly important as the user interface presents access to these 3rd party services from a menu within a document or spreadsheet. This has the potential to  create confusion over what is a Google product. Also since these services are listed with a tool (Google Drive) that is provided by the school it may create the impression that these tools are recommended, vetted, sanctioned or approved by the district.
    • This is shown below-the Drive App Pear deck has a link to policies, the Docs Add-on EasyBib does not.

  4. Clarifying the behavior of data collection for GSuite EDU users that are:
    1. Logged into GSuite but have YouTube disabled by the Admin
    2. Logged into GSuite but have YouTube enabled by the Admin

An example that raises this question is the network traffic when a non-logged in user searches YouTube and traffic appears to be going to google search services.

youtubecapture

Non-Core Services Enabled by Default in GSuite EDU

As part of a an attempt to do a methodical test of the “claims” recently made by the Mississippi AG I started from what should be the logical first step, creating a new GSuite for Education domain. One important thing to note is that when doing this (as of 1/17/17), 25 of the non-core 52 services are turned on by default (list attached). A fair criticism might be that the better privacy by default practice would have been to default all of these to OFF and require the Admin to enable them.

The flow looks like this, As soon as an admin signs up, they must verify the domain (creating a DNS entry to prove they have control). As soon as they do this they get the following message

default-1

It is worth noting that YouTube is enabled by default on new GSuite EDU domains (as of testing on /19/17) , despite the under 13 prohibition in the terms], though location, web history and google + are not. Blogger is on by default for all users.

An example of the “notice and consent” for Admin when enabling new services can be seen below.

 

default-3

 

A fair point could be made about contradicting terms and it is reasonable to think that Districts might be confused about how to determine what take precedence between the pop up dialog the admin clicks overrides and the general prohibition on under 13 in the youtube terms (the same is true for chrome web store which prohibits under 13)

Also the default setting for new products is to enable them

default2

 

The following table lists the default status of non-core services in a new EDU GAFE domain upon domain activation (as of 01/22/2017)

 

 

Service
Status
Blogger On for everyone
Quickly post thoughts, interact with people, and more
Chrome Management On for everyone
Configure policies for Chrome browsers
Chrome Web Store On for everyone
Marketplace for Chrome Web Apps.
FeedBurner On for everyone
Analyze, optimize, publicize, and monetize your RSS and Atom feeds.
Fusion Tables (experimental) On for everyone
Share, discuss, merge, and visualize your datasets
Google Bookmarks On for everyone
Create bookmarks you can access anywhere
Google Books On for everyone
Search the full text of books (and discover new ones)
Google Chrome Sync On for everyone
Sync your Google Chrome bookmarks across multiple computers
Google Developers Console On for everyone
Develop applications using Google APIs and the Google Cloud Platform.
Google Finance On for everyone
Google Finance
Google Groups On for everyone
Create mailing lists and discussion groups
Google in Your Language On for everyone
Volunteer to translate Google’s services into various languages
Google Map Maker On for everyone
Google Map Maker
Google Maps On for everyone
Find local businesses, view maps and get directions
Google My Maps On for everyone
Easily create, share, and publish custom maps.
Google News On for everyone
Create your own customized Google News
Google Photos On for everyone
Store and share photos online with Google Photos and Picasa Web Albums
Google Play Developer Console On for everyone
Distribute your Android content to Google Play
Google Public Data On for everyone
Public Data
Google Search Console On for everyone
Get Google’s view of your site
Google Takeout On for everyone
Copy content in Google accounts for use in another service or account
Google Voice On for everyone
Google Voice
Mobile Test Tools On for everyone
Mobile Test Tools – A set of HTML5 test suites along with supporting tools for browser compatibility.
Panoramio On for everyone
Share photos of your favorite places.
YouTube On for everyone
YouTube
DART for Publishers Off
DART for Publishers
DoubleClick Campaign Manager Off
DoubleClick Campaign Manager
DoubleClick Creative Solutions Off
DoubleClick Creative Solutions is a rich media production and workflow tool designed for creative agencies to streamline their rich media processes and take control of their turnaround times.
DoubleClick DART Enterprise Off
Enterprise class software ad serving solution
DoubleClick for Publishers Off
DoubleClick for Publishers
DoubleClick Search Off
Manage and optimize pay-per-click advertisements and keywords across all major search engines
Google AdSense Off
Earn money by displaying ads on your site
Google Advertising Professionals Off
Become a Qualified Google Advertising Professional
Google AdWords Off
Find buyers searching for what you sell
Google Analytics Off
Google Analytics
Google Code Off
Google’s home for developers
Google Custom Search Off
Create a search engine tailored to your needs
Google My Business Off
Get your business on Google for free with Google My Business
Google Payments Off
A faster, safer and more convenient way to shop online
Google Play Off
Google Play
Google Shopping Off
Shop smarter with wishlists of your favorite products
Google Translator Toolkit Off
Google Translator Toolkit
Google Trips Off
Google Trips – Your mobile travel assistant
Google+ Off
Google+
Individual storage Off
Individual storage
Location History Off
Location History and Reporting
Merchant Center Off
Post your products on Merchant Center, find them on Google.
Partner Dash Off
Partner Dash
Play Books Partner Center Off
Provide access to administrative interface for publishers to sell ebooks on Google Play and make them discoverable in Google Books.
Web & App Activity Off
Access and manage your web activity from any computer
YouTube CMS Off
YouTube Content Management System
YouTube Promoted Videos Off
Promote your content on YouTube

 

GSuite, O365 & eMail Protection in K12: A Large Scale DNS Record Analysis

Summary

This post looks at a large scale dataset of school district DNS records (more than 10,000) and offers two take aways. First it provides some quantitative evidence on the use of cloud mail and collaboration tools in K12 (specifically of Google and Microsoft) and second, it looks at how districts are (or are not) using the  simple measure available in DNS to lower the risk of  Phishing email attacks, like those that compromised computer systems at the DNC.

It is important to note that this was based on the “domain of record” and districts, and even individual schools may have have set up one or more of these systems on domains other than their primary domain. Additionally this looks at districts only and does not attempt to estimate the total number of users of either system by extrapolating District student and staff counts.

School Districts that use  GSuite (Google Apps for Education) and Microsoft Office 365 (O365) must make specific changes to their domain’s DNS entries in order to verify and use these tools. This means it is possible to estimate the general adoption of these two tools by examining the DNS records of K12 school districts.

A programmatic scan of school district domains of record (based on information from all 50 State Departments of Education)  in December 2016 found that for 10,915 domains 48% were using Google, 15% Microsoft Office 365, with less than 1% using both.

mx

This analysis was previously conducted in November 2013 using the same source data. At that time the % of Goggle Apps domains was essentially the same and the % of Office 365 domains was 5%.

Domains using Microsoft Office 365 showed a much higher rate of employing DNS record measures (SPF and DKIM) to reduce the risk of “spoofed” email than Districts using Google Apps.

Count SPF SPF% DKIM DKIM%
O365            1,641          1,161 98.17%                193 1.76%
Google Apps            5,229          3,133 59.92%                31 0.59%

Methodology:

School District domain addresses were identified and collected from their respective state Department of Education websites . While there are more than 14,000 school districts in the US, not all districts were listed with a domain, and in some cases where a domain was listed it was a only a web domain that did not correspond to a district’s email domain.  11,093 records were identified from which invalid URLs and domains with no MX record were eliminated, resulting in 10,915 domain addresses. This approach only looks at the domains of districts, not individual schools and only at what was determined to be the district’s primary domain. Actual use of either Google or Microsoft are greater due to the use of secondary district domains and individual school domains.

The DNS records for these domains were programmatically queried to look for specific entries that are required for the configuration of GSuite (Google Apps for Education) and for Microsoft Office 365 (O365).

Domain Verification:

Both Google Apps and Microsoft Office 365 require that the owner of a domain make a specific change to their DNS records in order to prove that they have control of the domain.

Microsoft Office 365 typically has a MX record entry that uses the format

 MS=msXXXX  (where XXXX is a long alpha numeric string)

GSuite (Google Apps for Education) typically uses a TXT record with the format:

google-site-verification=XXXX (where XXXX is a long alpha numeric string)

MX Records:

MX records control the routing of email and are a strong indicator that a domain is actively using a particular service (Google, Microsoft or other)

Microsoft Office 365 typically has a MX record entry that uses the format

 <domain.mail.protection.outlook.com>

GSuite (Google Apps for Education) typically has a MX record entry that uses the format:

aspmx.l.google.com or  alt<#>.aspmx.l.google.com

 

DNS Scan Results:

Approximately  25% of districts had started the setup process (verification) and did not complete the set up to the point of routing email through that system, of these, about half  ~12% also completed the verification and are routing mail with the other of the two tools. And only 25% of districts had no evidence of any Google Apps of Microsoft Office 365 DNS entries.

Count Percentage*
O365 MX Record      1641 15.03%
Google MX Record            5,229 47.91%
Both MX Records                  78 0.71%
No Google or O365 MX Record            3,967 36.34%
Google Verification, No MX Record            1,487 13.84%
O365 Verification, No MX Record            1,252 11.65%
O365 MX & Google Verification                535 4.98%
Google MX & O365 Verification                748 6.96%
No Google or O365 Entries 2,691 25.05%

 

*Categories overlap, so the total adds up to more than 100%

Securing eMail through DNS

Given the risk of  malware,  ransomware and worse that can be the result of spoofed email

SPF Records

Sender Policy Framework (SPF) is a simple email-validation system designed to detect email spoofing by providing a mechanism to allow receiving mail exchangers to check that incoming mail from a domain comes from a host authorized by that domain’s administrators.(source: Wikipedia).

Enabling SPF is done by adding a DNS record. The chart below show the percentage of domains scanned that had enabled SPF. The O365 districts showed a much higher rate of configuring the SPF setting.

Count SPF SPF%
O365 MX Record            1,641          1,161 98.17%
Google MX Record            5,229          3,133 59.92%
Other System (No Google or O365 MX Record)            3,967          1,790 45.12%
Both MX Records                  78                78 100.00%
Totals          10,915          6612 60.58%

DKIM

DomainKeys Identified Mail (DKIM) is an additional email authentication method designed to detect email spoofing. It allows the receiver to check that an email claimed to come from a specific domain was indeed authorized by the owner of that domain.It is intended to prevent forged sender addresses in emails, a technique often used in phishing and email spam. (source: Wikipedia)

DKIM use among K12 districts was negligible.

Count DKIM DKIM%
O365 MX Record           1,641                193 1.76%
Google MX Record             5,229                31 0.59%
No Google or O365 MX Record            3,967              5 0.13%
Both MX Records                78                  3 3.85%
Totals          10,915              232 2.13%

How to add these DNS settings in Google Apps and O365

School districts  using Google and Office 365 (and other email systems) can take simple measure to improve email security by enabling SPF, DKIM and DMARC.

  • Google settings can be found here for SFP, DKIM and DMARC
  • Microsoft Office 365 settings can be found here for SFP, DKIM and DMARC

Clarifying the District Responsibility for “Other” Google Services in GAE

Earlier this year I posted some thoughts about some concerns that the EFF had raised about Google’s responsibilities with regard to non-Google Apps google services. My take was that the majority of the responsibility is really on the school Google Administrator (and the District). I thought it would be illustrate this perspective by showing the “behind the scenes”  view of what Google Admin has to do to turn on one of Google’s Non-Google Apps services, and point out some extra steps they added in the last few months to make it even more clear.

 

The admin consoled makes a clear distinction between Google apps services (covered by the agreement),  other Google services, Marketplace apps, which are 3rd party tools that can be installed domain wide by the Admin, and SAML apps, 3rd party tools that the Admin can set up to use Google as the authentication provider for services that support the SAML 2.0 federated authentication standard.

SAML blog

The majority of non-core services are OFF for Google Apps for Education (GAFE) customers.

override.png

 

But it is also important for K12 schools to make sure that they set the option for releasing new products  to manual

release.png

 

Clicking thru to the settings for a “non-core” service there are notices that the service is not covered by the Google for Work agreement (of which Google for Education is a part). It also included a notice to make it clear that the admin needs to have the authority in their organization to accept the terms if they do turn it on.

chrome

In the last few months Google added an additional check box and very detailed wording to make it very clear what a school Google Admin’s responsibilities are  when they turn on a service that is not covered by the “core” GAE agreement

terms.png

 

 

 

 

Critical Thinking and the Student Privacy Debate

In my work at a large school district I spend much of my time testing education technology products to make sure that they are safe, secure and private. I read a lot of privacy policies and regularly push back on vendors to hold them accountable for what their policy say and their products do. In recent years privacy advocacy groups have played an important role in keeping the conversation about the importance of student data privacy in the public eye, but sometimes it is also necessary to apply the same “critical thinking” to the claims of both sides.

On December 1st, the Electronic Frontier Foundation (EFF) submitted a complaint* to the Federal Trade Commission (FTC) alleging (https://www.eff.org/press/releases/google-deceptively-tracks-students-internet-browsing-eff-says-complaint-federal-trade ) that Google had violated assurances that it made when signing the Student Privacy Pledge  http://studentprivacypledge.org/. As an advocacy group the EFF has been a strong voice for privacy for all, including students and provided information to teachers on a balance approach to copyright (https://www.teachingcopyright.org/) and produced important privacy enhancing technologies (HTTPS Everywhere, Privacy Badger, Lets Encrypt).

The EFF complaint raises three allegations but for this discussion the first and third are best considered together, and they are that…

  1. When students are logged in to their Google for Education accounts (GAfE), student personal information in the form of data about their use of non-educational Google services is collected, maintained, and used by Google for its own benefit, unrelated to authorized educational or school purposes.
  1. Google Collects Student Personal Information Through Changeable Administrative Settings In Chrome and Google Apps for Education Accounts

Regardless of IF the privacy pledge just covers education tools (as the creators of the pledge have indicated, and is the case with SOPIPA, widely regarded as the best of the current crop of student privacy laws),

I think it is important to ask whose responsible (and accountable) for the decision to enable or disable a service outside of the “core” google apps for Education tools. For me it is not Google, it is the school. That is our job, and our responsibility.  When I log into the Google Admin Console, it is very clear what is and is not part of google apps as shown in  this image from Google’s help section

SAML blog

 

And even more clear when I drill down into a specific service (Chrome Sync), which it the one that is the 2nd of the 3 complaints.

chrome.png

 

 

 

This concept, that the school has “direct control”, in this case the ability to turn on and off services, seems to be fundamental to the idea of outsourcing specific technical functions to Google, as a “school official”. So in the absence of a solid assurance from Google that an “additional google service” is not collecting, using or sharing data for purposes other than providing a service to a user, schools should turn off (or not turn on) the tool, OR get parent permission before turning it on, as some schools do for a variety of “web 2.0” tools . So let’s see if we focus on what is needed., which is…

 

  1. Making sure the people in schools that have the responsibility for this, have the training (and the backing from school leadership) to do this, and
  2. Making sure that that vendors (all vendors, not just the ones the media, legislators, advocacy groups and competitors fixate on) provide accurate, clear information about what information is collected, used and shared, (not just if it is used for ads). Senator Franken’s letter (http://techcrunch.com/2016/01/13/sen-franken-raises-questions-regarding-googles-student-data-collecting/) to Google is a good example of questions some school leaders may have, and that should be asked of ANY EdTech vendor.

Digging Deeper into FERPA Directory Information

Earlier this month Bill Fitzgerald posted a thoughtful piece on potential issues with FERPA’s “Directory Information” exception  among the excellent points made were that there is  disconnect between what FERPA data that would not generally be considered harmful if released and how if a another organization unintentionally released this type of data (name, email, address, height, weight) the average person might consider that a data breach.

Like anything related to FERPA, there is always more to say, and I thought I would add a few thoughts.

1)      There is a lot of confusion around what “opt-out” means. Schools are required to notify parents of their FEPRA policies annual and give them the opportunity to opt out. I have seen some cases where parents assumed that this meant that they were opting out of school sending their child’s data to ANY 3rd party under other FERPA exceptions. This is not the case, directory information is a relatively useless exception for creating student accounts, one because parents can opt out and two because of the next point.

2)      Schools can’t use the directory information exception to create student accounts in a 3rd party service. I have heard people say that “it is OK to use X service, we are sending them an excel file with just directory information so they can create student account.” Once the students log in they will be creating “education records” so you’d better be thinking about the “School Official” exception instead.

3)      Lastly, I think that that the objections to directory information come when parents are unclear about who the information can be shared with, or that it could be used for non-school purposes such as marketing. There is a path within FERPA for a more privacy-friendly approach to directory information and it is called limited directory information.  The looking at the 2011 Federal register final rule changes to FERPA  there is a reference to “limited directory information policy” (§ 99.37(d))   The basis for the clarification goes to some of Bill’s points (“concerns about the potential misuse by members of the public of personally identifiable information about students, including potential identity theft.”) and states that

 “an educational agency or institution may specify in the public notice it provides to parents and eligible students in attendance provided under § 99.37(a) that disclosure of directory information will be limited to specific parties, for specific purposes, or both.

The full input  can be found here

In a cursory internet search this afternoon I found a number of school districts that have adopted this approach for some or all data elements.

This says to me that there is an opportunity (and a need) for schools to better understand how to provide more clear and privacy friendly choices to parents within existing FERPA rules.