12/10/2018 10:10:56 AM
Good AI starts with good data
AI Development,Deep Neural Networks,AI solutions
https://appdevelopermagazine.com/images/news_images/Good-AI-Starts-With-Good-Data-App-Developer-Magazine_hwaoi1y1.jpg
App Developer Magazine

Good AI starts with good data



Heather Ames Versace Heather Ames Versace in Artificial Intelligence Monday, December 10, 2018
11,657

Creating powerful AI solutions requires that developers must use the right kind of data in order to train the algorithm effectively.

Nowadays, it seems like every company is doing something with AI - or if they’re not, they’d like to be. The technology promises to improve the way we work and live, and industries ranging from manufacturing to retail, to inspections, and everything in between are grappling to build their own AI solutions. But where to begin?

I like to say that AI is like cooking - it’s all about the ingredients. Without good ingredients, even the best recipes will fall flat. The same goes for AI, but in this case, the ingredients are your data. If organizations don’t take a close look at the data they need to develop an AI solution and ensure it’s prepared and organized effectively, AI solutions will be riddled with inefficiencies - whether the result is biased algorithms, ineffective solutions, or AI that simply doesn’t work.
High-functioning AI begins and ends with good data.

Data: The Good, The Bad And The Ugly

One of the biggest challenges of training deep neural networks (DNNs) is the cumbersome process of training them - AI systems don’t just need data to learn about the world, they need hundreds of thousands of times more data than humans do.

Luckily, we humans are currently producing 2.5 quintillion bytes of data each day. The internet is an absolute data gold mine. Unluckily, most of it isn’t fair game, because people are generally unwilling to share their personal data, even if it does mean building better AI systems.

And, if you’re lucky enough to overcome the hurdle of having enough data, there is still the question of quality. Not all data is created equal. To recognize an object or behavior, AI must be trained on data in all different conditions, from various angles and the like. Otherwise, algorithmic bias is inevitable.

As data scientist Daniel Shapiro details in a recent post, there are many different data quality pitfalls, including data sparsity, data corruption, irrelevant data, missing important patterns, wrong patterns, and bad labeling.

The Right Data For Computer Vision Solutions

The most successful companies are the ones who are able to break down data silos across their organizations and gather a holistic view of the data they have available. Once they have done this, they are able to create processes for augmenting their data to achieve a level necessary for a productized solution.

This is where the good data lives: they own it, and it is perfectly suited to their specific use case.

People often ask me how much data is needed to create a meaningful solution. Our rule of thumb for a given use case is that 1,000 images/class is the barrier to entry, and, in order to reach production level accuracy (90%+), 5,000–10,000 images/class are required.

However, the issue of quality - even when it seems like there’s an ample quantity of data - prevails. I’ve seen examples of this in the inspection industry, where I’ve been amazed at how many of their images focus on only one angle of an object or are taken in only one specific lighting condition. Photos like these aren’t going to give their AI-powered drones the information they need to do their jobs.

In other words, bad photos equal bad drones.

When Good Images Go Wrong

But it’s not just the quality of the photos themselves that matter; there is ample opportunity for good photos to be botched during the tagging process.

Because AI applications require thousands of images to be tagged, humans can tag poorly or introduce errors - especially because current tools are simple picture editing tools, like Microsoft Paint, which weren’t built for this purpose. Even small imprecisions, compounded over thousands of images, can have a large impact on the accuracy of a computer vision model. And if you think about a production-grade product or solution, every percentage point increase in accuracy can have a big impact on the organization.

It’s also worth mentioning that, since data tagging cost is proportional to the time spent tagging, this step alone often costs tens to hundreds of thousands of dollars per project.

A Good Tagging Tool Is The Key Ingredient

I recently attended a webinar about implementing AI for inspection services. The host spoke about how they are paying fifty to one hundred dollars an hour to have civil engineers do annotation and classification work. They felt they needed industry experts tagging the images, but it was costing them a huge amount of money, and it was their biggest bottleneck.

Data-labeling services like Scale API, Mighty AI, and CloudFactory, which contract with hundreds of labelers, often overseas, are a much more efficient and cost-effective alternative. Companies looking to handle their tagging internally, meanwhile, need a precise, automated, purpose-built annotation tool.

A(I) Recipe for Success

Engineers often refer to AI development as a “sprint,” striving to rapidly test, iterate and deploy AI. But, AI is deeply rooted in research, and the reality is that traditionally, there’s a long road to production. With the right data tagging tools, though, rapid testing is within reach - in turn, enabling faster iteration and deployment.

Investing in the best tools and the right people to accurately and efficiently annotate your data will make a huge difference in the success of a production-grade AI solution. And - with any luck - this “recipe” for data tagging and AI app development success will keep your customers coming back for seconds.




A terrifying look into where AI is going and how the singularity is near

Ray Kurzweil, the prize-winning author, and scientist says that artificial circuits replicating themselves at a molecular level will merge with the biological circuits that constitute our nervous systems, giving rise an "enhanced" human super-intelligence. Once this starts happening, what we now call the Internet will in effect become telepathic, giving these enhanced humans instantaneous access to all available knowledge and information as they fashion their brave new world.

475 Tax Deductions for Businesses and Self-Employed Individuals

Are you paying more taxes than you have to as a developer or freelancer? The IRS is certainly not going to tell you about a deduction you failed to take, and your accountant is not likely to take the time to ask you about every deduction you’re entitled to. As former IRS Commissioner Mark Everson admitted, “If you don’t claim it, you don’t get it.

The Latest Nerd Ranch Guide (3rd Edition) to Android Programming

Write and run code every step of the way, using Android Studio to create apps that integrate with other apps, download and display pictures from the web, play sounds, and more. Each chapter and app has been designed and tested to provide the knowledge and experience you need to get started in Android development.

A hands-on guide to mastering mobile forensics for iOS and Android

Get hands-on experience in performing simple to complex mobile forensics techniques Retrieve and analyze data stored not only on mobile devices but also through the cloud and other connected mediums A practical guide to leveraging the power of mobile forensics on popular mobile platforms with lots of tips, tricks, and caveats.

Gps tracker for kids

The Chirp GPS app is a top-ranked location sharing app available for Apple and Android that is super easy to use, and most of all, it's reliable.


This content is made possible by a guest author, or sponsor; it is not written by and does not necessarily reflect the views of App Developer Magazine's editorial staff.