The glue that holds machine learning and artificial intelligence together is data. Without the data to create complex learning algorithms from, and to create those life-like AI experiences - neither of them are worth a brass farthing.
Charly Walther, the VP of product and growth at Gengo.ai, joined Gengo from Uber, where he was a product manager in Uber’s Advanced Technologies Group knows this space well. He says that good AI should learn to produce data all by itself, and that building proprietary data sets through unique ways such as crowdsourcing, can help lead the path to a successful platform.
So we recently took a few moments with Charly, to get his insight on all things machine learning, and how Gengo.ai was able to produce a high-quality multilingual data structure using it - plus how developers can lean on the platform for their own AI projects.
Walther: It’s a very exciting time for startups entering the field of machine learning and AI. While the opportunities are tremendous, the hype surrounding machine learning distracts investors and founders from a key hurdle: in many cases it’s data, not algorithms, that will determine whether the final product will deliver on the promises made.
Put simply, AI training data refers to input data for a machine-learning model alongside the desired output, which the AI should eventually learn to produce by itself. For example, a training data set can contain images of cats and dogs labeled correctly as cats and dogs respectively, tweets marked as either positive or negative for a sentiment analysis algorithm, or audio recordings alongside transcripts for a transcription machine-learning engine.
Without rigorously accurate and categorized data, machine-learning algorithms simply cannot attain an advanced level of operation. Algorithms find relationships, develop understanding, make decisions, and evaluate their confidence based on the training data they’re given. And the better the training data is, the better the model performs.
So, if businesses want to succeed with their machine-learning projects, their top priority should be building proprietary data sets. The question is, what’s the most cost-effective way to do that?
Walther: Gengo started life as crowd-powered translation service with a focus on quality and precision. After several years, we had built a multilingual crowd of 21K+ skilled linguists and we were providing a range of language services to leading Fortune 500 companies like Amazon and Facebook. Often, we’ve helped them with projects to improve their AI investments — which gave us an idea: what if we optimized our offering for the needs of developers working on machine-language projects that require specialized language datasets? This led to the formation of Gengo.ai: a platform for any business or enterprise around the world that needs access to high-quality multilingual data to succeed with their AI strategy.
Walther: With Gengo.ai, developers have access to a very large crowd platform that is 100-percent focused on language tasks — those that involve natural language, speech, communication, and multilingual projects. Many developers are probably already familiar with Amazon Mechanical Turk: an early player in this field, and a great resource, especially for smaller tasks. It’s also cost-competitive, because it draws on an almost infinite pool of cheap labor.
But it also has its drawbacks. With Mechanical Turk, you’re less likely to find a concentration of specific experts in your crowd, which can be a factor for developers with specialized needs. Also, Amazon offers no hand-holding to ensure the quality of the data, which means the burden of quality control falls on you, the client. This means developers will often need to iterate through several submissions to the crowd to arrive at the dataset they need. This can impact overall project duration and costs.
Other crowdsourcing solutions such as Upwork are great for finding 1-2 people with whom you can closely interact with. However, these don’t scale well because they aren’t built with platform technology such as job distribution systems and quality management systems to manage tasks effectively across 10, 100, or 1,000+ people.
Walther: Like most outsourcing decisions, it comes down to defining your core competence. Our clients tell us that they know that they don’t have the in-house expertise to manage the collection and curation of a large language data set, that the opportunity cost of building a platform for this with their own engineering resources is too high, and therefore the ROI doesn’t pencil out. By outsourcing, the development team also benefits from the cost and time efficiencies of a service that’s specifically designed to manage the process of defining, submitting and gathering language-based data. Ultimately, it all boils down to an important outcome: faster time to market of a better-trained AI product.
Walther: Here are some general guidelines to maximize your project success.
Walther: Here are three examples that give you an idea of the enormous breadth of projects we undertake:
Address:
3003 East Chestnut Expy
STE# 575
Springfield, Mo 65802
Phone: 1-844-277-3386
Fax:417-429-2935
E-Mail: contact@appdevelopermagazine.com