Anomalo Inc. today launched a new tool that aims to help enterprises keep check on the unstructured information that’s becoming critical to the success of artificial intelligence systems.
The Databricks Inc.- and Snowflake Inc.-backed startup says its new Unstructured Data Monitoring tool gives enterprises an easy way to spot any problems with the enormous volumes of unstructured data, such as text files and images, hosted in any location.
Anomalo is best known for its data quality platform, which is used by companies to scan structured data that makes up their business records for quality issues. It works by scanning the information stored neatly in database rows and columns, checking for out-of-date records that need to be replaced with fresh information, duplicate database rows, missing fields and so on. In addition to identifying erroneous records, Anomalo also provides tools for fixing them, automating the steps involved in identifying the root cause of data quality issues.
With its new tool, Anomalo is now bringing its expertise to the vast amounts of unstructured information that resides in cloud data warehouses and data lakes, looking to help companies ensure trust in every kind of data type.
It’s a key development that should expand the usefulness of Anomalo’s platform, since unstructured data actually makes up the vast majority of all records stored by most companies. At the average enterprise, structured data that’s stored neatly in databases only accounts for around 20% of all of their files. The other 80% generally tends to be unstructured data, including call transcripts, text and PDF documents, emails, messages, order forms, audio and image files, and the like.
Though this information generally wasn’t deemed mission-critical in the past, it’s changing with the fast emergence of AI. High-quality and domain-specific information is vital for training and customizing the large language models that power generative AI workloads. Companies generally have tons of this information available, but the challenge is they have very few clues about what’s inside it and whether it can be trusted.
Anomalo’s unstructured data monitoring tool aims to change that. It introduces a new capability called Anomalo Workflows, which acts as a hub for managing unstructured information as well as monitoring it.
With this new tool, companies can identify and fix quality issues such as duplicate files, errors, personally identifiable information and abusive language. It also provides a way to analyze large volumes of unstructured information to try to extract useful business insights and, finally, convert it into clean, reusable datasets for training AI models.
What’s impressive is the sheer volume of information that can be handled by Anomalo Workflows. The company says it can analyze up to 100,000 documents in a single operation, and be set up to run continuously as new information is fed into it. What previously took months to sift through manually can now be automated in a matter of minutes, the company says.
Anomalo co-founder and Chief Executive Elliot Shmukler said everyone is scrambling to try to get their hands on as much unstructured information as possible to feed into their AI models, but no one is doing anything about the quality of this kind of data, or insights it might provide.
“You can think of our Unstructured Monitoring product and Anomalo Workflows as building blocks that can be assembled in thousands of configurations to achieve pretty much any customer use case for unstructured data quality or insights,” Shmukler said.
The CEO said a large retailer, for example, can use the tool to mine thousands of support tickets and call logs to try and understand why its customers are unhappy with a new product or service. A restaurant operator can use it to surface meaningful insights from dozens of social media comments, reviews and other types of feedback.
“That kind of analysis wasn’t easily possible before Anomalo,” Shmukler pointed out. “Just as we redefined data quality for structured data, we’re now helping enterprises trust and extract value from unstructured data.”
Constellation Research Inc. analyst Michael Ni said this is one sign of the beginning of a new phase of rapid consolidation across the AI and data observability markets. Ni believes enterprises will welcome this consolidation. Because AI workloads are powered mostly by unstructured data, companies need visibility into their vector database stores and the data behind each prompt, the analyst said. Simply monitoring data pipelines and tables is no longer enough.
“Anomalo is bringing observability to documents, chat logs and transcripts, and it may mark the beginning of a new era where trust in AI begins,” Ni said. “It’s also the beginning of the end for siloed data observability, and the next platform battle will be around ‘decision observability,’ where AI signals come together in one trusted view.”
Image: SiliconANGLE/Dreamina
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU