1. https://appdevelopermagazine.com/monetize
  2. https://appdevelopermagazine.com/criteo-open-sources-one-terabyte-machine-learning-dataset/
6/23/2015 9:09:15 AM
Criteo Open Sources One Terabyte Machine Learning Dataset
Anonymized Datasets,Banner Advertisements,Machine Learning
/Terabyte-Machine-App-Developer-Magazine_kh8nhakr.jpg
App Developer Magazine

Monetize

Criteo Open Sources One Terabyte Machine Learning Dataset


Tuesday, June 23, 2015

Stuart Parkerson Stuart Parkerson


Criteo is releasing to the open source community an anonymized machine learning dataset with more than four billion lines totaling over one terabyte in size, built through Criteo’s advertising click prediction dataset. Criteo’s terabyte dataset is hosted on Microsoft Azure, and details on how to access, utilize and download it can be found at the Criteo Labs website.

The goal for releasing the dataset is to support academic research and innovation in distributed machine learning algorithms. Anonymized datasets pulled from real-world applications can help allow academic researchers to test, refine and advance the various machine learning platforms. 

Criteo relies on its own proprietary distributed learning algorithms to predict when a consumer is most likely to click on a particular ad with a goal of increasing the return on an advertiser’s investment in ad delivery. Criteo sees over 30 billion HTTP requests per day (including as many as two million requests per second), delivers three billion unique banner advertisements per day, and stores 20 terabytes of new data daily with a capacity for 37 petabytes of raw storage.

The released dataset has already been put to use as a benchmark by researchers at Carnegie Mellon University. “Criteo's one terabyte dataset has proven invaluable for benchmarking the scalability of the learning algorithms for high throughput click-through-rate estimation, which we are developing as part of our Marianas Labs project,” said Alexander Smola, Professor at Carnegie Mellon University.
Criteo Open Sources One Terabyte Machine Learning Dataset




Read more: http://www.criteo.com/




Subscribe to App Developer Magazine

Become a subscriber of App Developer Magazine for just $5.99 a month and take advantage of all these perks.

MEMBERS GET ACCESS TO

  • - Exclusive content from leaders in the industry
  • - Q&A articles from industry leaders
  • - Tips and tricks from the most successful developers weekly
  • - Monthly issues, including all 90+ back-issues since 2012
  • - Event discounts and early-bird signups
  • - Gain insight from top achievers in the app store
  • - Learn what tools to use, what SDK's to use, and more

    Subscribe here



Stay Updated

Sign up for our newsletter for the headlines delivered to you

SuccessFull SignUp

Featured Stories


Top manufacturing trends for 2026
Top manufacturing trends for 2026 Tuesday, June 23, 2026




API scoring tool shows if your API is ready for AI
API scoring tool shows if your API is ready for AI Monday, June 22, 2026


Agentic AI Reality Check: The Million-Dollar Mistake Hiding Inside ERP
Agentic AI Reality Check: The Million-Dollar Mistake Hiding Inside ERP Friday, June 19, 2026


Influencer Debate AI Anthropic IPO Reveals Industry Concerns
Influencer Debate AI Anthropic IPO Reveals Industry Concerns Wednesday, June 17, 2026


Subscription apps are losing users faster than ever
Subscription apps are losing users faster than ever Tuesday, June 16, 2026


DomainTools announces real time threat feeds
DomainTools announces real time threat feeds Monday, June 15, 2026


Take It Down Act results in warning letters from FTC
Take It Down Act results in warning letters from FTC Friday, June 12, 2026


Nvidia valuation fears grow
Nvidia valuation fears grow Friday, June 12, 2026


Anthropic launches Claude Design
Anthropic launches Claude Design Wednesday, June 10, 2026


Spotlite Expands Into AI Era With New IP Protection Tool
Spotlite Expands Into AI Era With New IP Protection Tool Wednesday, June 3, 2026


Get More App News