A Simpler Way to Improve Computer Vision – ScienceDaily


Before a machine learning model can complete a task, such as identifying cancer in medical images, the model must be trained. Training image classification models typically involve viewing millions of example images collected in a huge dataset.

However, using real image data can raise practical and ethical concerns: images may contravene copyright laws, violate people’s privacy, or be biased against a particular racial or ethnic group. To avoid these pitfalls, researchers can use image generation software to generate synthetic data to train the model. But these techniques are limited because specialized knowledge is often required to manually design image generation software that can generate effective training data.

Researchers from MIT, the MIT-IBM Watson AI Lab, and elsewhere have taken a different approach. Instead of designing custom image creation programs for a specific training task, they collected a data set of 21,000 publicly available programs from the Internet. Then they used this large set of basic image generation software to train a computer vision model.

These programs produce various images that display simple colors and textures. The researchers did not coordinate or alter the programs, each of which consisted of only a few lines of code.

The models they trained using this large data set of software classified the images more accurately than other synthetically trained models. And although their models underperformed models trained on real data, the researchers showed that increasing the number of image programs in the data set also increased model performance, revealing a path to higher accuracy.

“It turns out that using a lot of unsaturated software is actually better than using a small set of programs that people need to process. Data is important, but we’ve shown that you can go very far without real data,” says Manil Paradad, a graduate student. in Electrical Engineering and Computer Science (EECS) works at the Computer Science and Artificial Intelligence Laboratory (CSAIL) and is the lead author of the paper describing the technology.

Co-authors include Tongzhou Wang, an EECS graduate student at CSAIL; Rogerio Ferris, Principal Scientist and Director, MIT-IBM Watson AI Lab; Antonio Torralba, Professor of Electrical Engineering and Computer Science at Delta Electronics and CSAIL member; Senior author Philip Isola, EECS and CSAIL Associate Professor; Along with others at JPMorgan Chase and Xyla, Inc. The research will be presented at the Neural Information Processing Systems conference.

Rethink pre-training

Machine learning models are usually pre-tested, which means they are trained on a single data set first to help them build parameters that can be used to tackle a different task. An X-ray classification model might be pre-tested using a huge dataset of synthetically generated images before it is trained for its actual task using a much smaller dataset of real X-rays.

These researchers previously showed that they could use a few image generation programs to generate synthetic data for pretraining the models, but the programs needed to be carefully designed so that the synthetic images match certain properties of the real images. This method made it difficult to scale.

In the new work, they used a massive data set of desaturated image-generating software instead.

They began by collecting a collection of 21,000 image generation programs from the Internet. All programs are written in a simple programming language and consist of a few snippets of code, so they generate images quickly.

“These programs are designed by developers around the world to produce images that have certain characteristics that we’re interested in. They produce images that kind of look like abstract art,” explains Pradad.

These simple programs can run so quickly that researchers do not need to pre-produce the images to train the model. The researchers found that they could create images and train the model simultaneously, which simplified the process.

They used their big data set of image generation software to pre-train computer vision models for both supervised and unsupervised image classification tasks. In supervised learning, the image data is labeled, while in unsupervised learning, the model learns to categorize images without labels.

Accuracy improvement

When they compared their earlier models with the latest computer vision models that were previously tested using synthetic data, their models were more accurate, which means they put the images into the correct categories more often. While the levels of accuracy were still lower than the models trained on real data, their technique narrowed the performance gap between models trained on real data and those trained on synthetic data by 38 percent.

“Importantly, we show that relative to the number of programs you collect, performance measures logarithmically. We don’t saturate performance, so if we collect more programs, the model will perform better. So, there is a way to extend our approach,” Mannell says.

The researchers also used each individual image-generation program for pre-training, in an effort to uncover factors that contribute to the model’s accuracy. They found that when a program produced a more diverse set of images, the model performed better. They also found that color images with scenes that filled the entire board tended to improve model performance the most.

Now that they have demonstrated the success of this pre-training approach, the researchers want to extend their method to other types of data, such as multimedia data that includes text and images. They also want to continue exploring ways to improve image classification performance.

“There is still a gap to fill with models trained on real data. This gives our research a direction that we hope others will follow,” he says.



Source link

Related Posts

Precaliga