DATA+AI SUMMIT: Scaling Foundational AI/ML Models and Applying Them to Modern AI/ML Use-Cases

July 18, 2023 Razvan Chiorean

One of the advantages of the era we live in these days is that it's not about starting from scratch; it's about starting from value.

automation, data science

In the present era, the majority of us are familiar with foundational models like LLM/ChatGPT. However, there exists a wide range of enterprise foundational models that can be quickly implemented, trained and utilized for enterprise purposes. This approach greatly enhances the effectiveness of AI/ML models upon deployment, empowering AI teams to efficiently navigate and deliver tangible business benefits. Databricks serves as the ideal toolkit to facilitate the adoption of this strategy.

In the realm of data, analytics, and AI, individuals can explore the driving forces that have led to a significant emphasis on data. Data professionals, recognizing the challenges posed by databases, are actively searching for innovative approaches to tackle these obstacles.

By delving into this domain, one can gain valuable insights into the increasing interest in LLMs (Language Models) and how they are reshaping the landscape of data, analytics and AI. The continuous developments in this field present opportunities for advancements and offer a glimpse into the future of this dynamic market.

ESG Compliance: Common Use-Case

Imagine you are a large food and beverage supplier, committed to sustainability. You have installed solar panels, implemented an excellent diversity program and have taken all the right steps within your organization. However, there's a critical aspect to consider—the supply chain, comprising tens of thousands of suppliers. To ensure compliance, your supply chain must also adhere to ESG (Environmental, Social and Governance) standards.

Organizations find themselves increasingly under pressure to effectively document and enhance their ESG (Environmental, Social, and Governance) performance.

One crucial ingredient in many products today is palm oil. Unfortunately, the production of palm oil varies across different countries, ranging from environmentally conscious practices to less controlled and affluent environments. The palm oil supply chain presents significant challenges, with incidents like illegal palm oil being mixed with legitimate shipments, sometimes orchestrated by literal pirates who pump 30 tons of illegal palm oil into a 70-ton shipment of genuine palm oil. Imagine the consequences if your ESG supply chain were compromised as a food distributor.

Interestingly, ESG is now being valued more than ever. Surprisingly, British American Tobacco (LON:BATS) ranks higher in ESG than Tesla, primarily because of how they manage their supply chain and their transparency in explaining their practices. This highlights the monetary value associated with effectively overseeing the supply chain. It may sound like an exceedingly complex problem and indeed it is.

It's worth noting that two-thirds of an organization's ESG impact lies within its suppliers. With 30,000 suppliers, for example, it becomes a monumental task to navigate and collaborate with each one effectively. The question arises: How can you efficiently work with such a vast network of suppliers?

Scaling Supplier ESG Compliance through LLM and Applied AI/ML Automation

Imagine encountering news in a language other than English and realizing that you have been delayed in leveraging it for augmentation and performing specific actions. How does this delay impact the entire supply chain? How can you predict whether the news is positive or negative?

When you delve into ESG reports, the documents provided by your suppliers are often unstructured, consisting of dozens of pages filled with visually appealing graphs, charts, and copious amounts of text. Extracting meaningful information from such reports can be an almost impossible task. It becomes crucial to predict the value of specific criteria and determine how to score them accurately.

Leveraging Databricks, LLM and Applied AI

Bringing It All Together: Transitioning from Models to Desired Outcomes

The ultimate objective is to seamlessly converge all data streams into a unified VectorDB, establishing a centralized hub. To accomplish this, specialized agents are developed to query both the models and the LLM, enabling data summarization concerning companies, categories and inferences.

However, several crucial considerations arise during this process:

Safeguarding Private Data: Which specific data elements require isolation and strict privacy measures?
Establishing Trust Scores for News Sources: How can we construct reliable trust scores to evaluate the credibility of various news sources?
Preventing Hallucinations: What measures can be implemented to prevent the occurrence of misleading or false data interpretations?

By addressing these considerations, we ensure the reliability, privacy, and accuracy of the system, facilitating the effective transformation of models into tangible outcomes.

Harness the Power of Dolly 2.0 or Other Open Source Models for Summarization Tasks.
These models prove highly beneficial in extracting relevant information while maintaining quality and quantity scores aligned with the summarization and inference models.
Converge the extracted data into a comprehensive feature store, ensuring easy accessibility and organization.

Create anti-hallucination checks with model inference

Establish a Control Loop Incorporating Data and Results

Construct a robust control loop that seamlessly integrates data inputs and generated results. This iterative process ensures continuous feedback and improvement within the system.

Develop both tabular data structures and embeddings to effectively represent and organize the data. These complementary approaches enable comprehensive analysis and facilitate the extraction of valuable insights.

Anti-hallucination checks with model inference

Leverage feature computation techniques to validate outputs derived from the agents. By deriving relevant features from the data, the accuracy and reliability of the outputs can be assessed and validated.

Train models to infer outcomes and provide substantial support for further validation. These trained models enhance the decision-making process by generating predictions and insights that can be cross-validated against real-world scenarios and data.

Merge ML with Search for Outcome Validation

By merging machine learning (ML) techniques with search capabilities, organizations can effectively validate outcomes. This integrated approach not only mitigates the risk of inaccuracies but also generates referenceable metrics that align with business objectives.

Configure retrieval

Tip: Take a step-by-step approach, ensuring the ability to predict outcomes with high confidence before progressing to the next stage. This incremental strategy ensures a solid foundation and minimizes potential pitfalls along the way.

Empower Multiple Expert Agents with Controlled Responses

Develop a diverse set of expert agents, each specializing in a particular domain. These agents bring expertise and precision to their respective areas, establishing clear controls and boundaries for their responses. Implement a query router concept to intelligently direct queries to specific models based on their performance and the sensitivity of the data involved. This ensures optimal utilization of models while adhering to specific requirements and considerations.

By creating multiple expert agents and utilizing a query router, organizations can effectively leverage their capabilities while maintaining control and sensitivity in responses.

Enhancing the Applied Approach

Through an applied approach, organizations can unlock the potential to generate highly trustworthy and detailed insights that can be rapidly summarized. Recommendations can evolve into independent agents of excellence, capable of delivering valuable guidance and expertise.

By focusing on available data, the reliance on chat is minimized, reducing the risk of misinformation and ensuring more accurate and reliable outcomes.

SUMMARY

Streamlined Multi-Modal Information Extraction

Advanced information extraction solutions utilize a fusion of Natural Language Processing (NLP) and Generative AI techniques to extract crucial details from various documents, including invoices. This automated extraction process ensures high accuracy, time savings and mitigates the risk of errors.

The solution is designed to handle diverse document types, encompassing embedded images, unstructured PDF templates and structured tables. It can even extract valuable information from video, audio and other document assets, making it a versatile tool applicable across multiple industries.

By integrating multiple AI techniques tailored to different document types, including computer vision, generative AI, and NLP, this multi-modal approach enhances the adaptability, effectiveness and precision of the extraction process. This solution offers a powerful and dependable information extraction capability for a wide range of applications.

See this content in the original post