Criteo Open Sources One Terabyte Machine Learning Dataset
Tuesday, June 23, 2015
Criteo is releasing to the open source community an anonymized machine learning dataset with more than four billion lines totaling over one terabyte in size, built through Criteo’s advertising click prediction dataset. Criteo’s terabyte dataset is hosted on Microsoft Azure, and details on how to access, utilize and download it can be found at the Criteo Labs website.
The goal for releasing the dataset is to support academic research and innovation in distributed machine learning algorithms. Anonymized datasets pulled from real-world applications can help allow academic researchers to test, refine and advance the various machine learning platforms.
Criteo relies on its own proprietary distributed learning algorithms to predict when a consumer is most likely to click on a particular ad with a goal of increasing the return on an advertiser’s investment in ad delivery. Criteo sees over 30 billion HTTP requests per day (including as many as two million requests per second), delivers three billion unique banner advertisements per day, and stores 20 terabytes of new data daily with a capacity for 37 petabytes of raw storage.
The released dataset has already been put to use as a benchmark by researchers at Carnegie Mellon University. “Criteo's one terabyte dataset has proven invaluable for benchmarking the scalability of the learning algorithms for high throughput click-through-rate estimation, which we are developing as part of our Marianas Labs project,” said Alexander Smola, Professor at Carnegie Mellon University.
Read more: http://www.criteo.com/
How feature flags saved my marriage Thursday, July 19, 2018
Unreal Engine Marketplace says it will take less developer revenue Wednesday, July 18, 2018
Best mobile gaming apps will share $300K in prizes from new contest Wednesday, July 18, 2018
Open source IT automation solution from Red Hat gets an update Wednesday, July 18, 2018
10 years of apps but over 95 percent of them are invisible to users Wednesday, July 18, 2018
Stay UpdatedSign up for our newsletter for the headlines delivered to you