Google Datasets to tinker around Data processing and Visaulizations

For the benefit of the community, Google has released various datasets over years of data collection & scaling and training corpora of public web pages. Some of them are, Co-occurrence of words for word n-gram model training Job queue traces from Google clusters 800M documents annotated with Freebase entities 40M disambiguated mentions in 10M web pages linked to Wikipedia entities Human-judged corpus of binary relations about Wikipedia public figures Wikipedia […]