Broken Telephone
Introducing our advanced Natural Language Processor (NLP), a dedicated solution for optimizing prompts in Large Language Models (LLM). In the dynamic realm of artificial intelligence, prompt construction is pivotal for response accuracy. Current approaches, like empirical prompt engineering, lack scalability and generality. This poses challenges for users, with 71.40% in non-professional and 85.71% in professional settings reporting occasional inaccuracies and frustration. Our research underscores the task-dependency of answers, with procedural tasks outshining creative ones. Additionally, 28.60% of users cross-reference AI responses, revealing 100% discovery of misleading data. In response, our NLP emerges as a transformative tool, translating human input into optimized prompts. Tailored for high-accuracy domain-specific contexts, it promises a seamless and reliable experience, revolutionizing NLP AI for both professional and general users.
Product Experience
Problem Space
A natural language processor for LLM prompt optimization. Allows higher accuracy responses in professional settings.
Problem Statement
There is a need for a tool to translate human input into more optimized prompts based on the LLM used for high-accuracy domain-specific context and generative tasks.
Problem Background
The increasing adoption of natural language processing (NLP) AI models has revealed the unpredictable behaviour of prompts. For instance, asking an AI question with different structures produces different quality answers. Despite both questions having the same information, the prompt layout influences the answer's accuracy and effectiveness.
The intrinsic characteristics of machine learning (ML) hinder the characterization of data ingestion. In most architecture, it is impossible to infer the structures or keywords that would produce higher-quality answers.
Current solutions, such as prompt engineering, have an empirical approach. This solution is not feasible since not only does considerable time investment result in limited pattern recognition, but these structures are not generalizable for other Large Language Models (LLM). This issue deeply affects all AI users, particularly when these tools are used in a high degree of domain-specific knowledge context.
User research reveals that the likelihood of using an NLP AI tool is directly proportional to the accuracy of the answers. Moreover, 71.40% of users in non-professional settings find that responses are sometimes aligned with expectations while 85.71% of users in professional contexts find answers to be occasionally accurate. An 85.71% of users report several times before arriving at an acceptable outcome, while the remaining 14.29% provide more context and information. This leads to 71.40% of all users having encountered challenges or frustration in the response at least once.
Furthermore, 85.71% of all users have noticed the effectiveness of answers is dependent, with being creative the least precise and procedural tasks the most effective. Only 28.60% of general users cross-reference AI answers, from which 100% have found misleading or factually incorrect data. Finally, the overall general user satisfaction averages a 3 out of 5 for general usage, 2 out of 5 for professional settings and 2 out of 5 when using non-standard, informal language or colloquialisms.
Therefore, there is a need for a tool to translate human input into more optimized prompts based on the LLM used for high-accuracy domain-specific context and generative tasks.
Research Insights
User Pain Points
Highly specialized tasks in professional settings and creative generative tasks in general usage require multiple prompting customization before having an acceptable answer. Users are not familiar with the techniques to write optimized prompts and current methods to determine best practices and patterns are not scalable.
Supporting Data
Research found that 71.40% of users in non-professional settings find that responses are sometimes aligned with expectations while 85.71% of users in professional contexts find answers to be occasionally accurate.
Feedback
An 85.71% of users rephrase several times before arriving at an acceptable outcome, while the remaining 14.29% provide more context and information. This leads to 71.40% of all users having encountered challenges or frustration in the response at least once.
Landing
There is a need for a tool to translate human input into the most optimized prompts based on the LLM being used for high-accuracy domain-specific context and generative tasks.
The solution should be a translational layer between the user’s input and the base AI prompt input. Its operation should not largely deviate from current interaction methods. The solution should have a global accuracy metric while characterizing effectiveness and relevance relative to the user. It should be flexible and available in most systems.
The user should be able to review the prompt before input and can be regenerated based on the target. Considering that pattern inference and generalization are the fundamental capabilities of ML models, a convolution or transformer-based architecture would be ideal.
Targeted Features
- Users can access tools independent of the platform of the targeted AI model.
- Users can access the tool in a corporate-controlled environment with minimal permission requirements.
- User interaction does not require multiple steps to generate prompts.
- Users can access tools in various environments.
- Users can easily further train the translator in a domain-specific task.
- Users in professional settings can safely train tools without exposing IP.
- Users can keep trained areas under private networks.
- Users can define the relevance of an answer locally.
- Users can define the effectiveness of the answer locally.
- Users can seamlessly change between profiles with variant relevance and effectivity metrics.
- Users can easily access pre-trained profiles for generative non-professional tasks.
- Users can quickly regenerate and adapt prompts.
User Flows/Mockups
Future Steps
A successful design is indicated by high demand in translations weighted by small modification chains per session. Moreover, domain-specific training is ideal when a low number of translations leads to high accuracy satisfaction. Finally, many custom tasks indicate an increasing adoption of AI tools in professional settings.
The first version of the model should tackle the accessible and model-agnostic characteristics. However, the next step in development would be adding safety measures for professional customized training, profile swapping and specialized workspaces. This last element should allow flow between task domains and quickly modify them to increase productivity while keeping accuracy.
Learnings
Product Manager Learnings:
Jimmy Esteban Velasco Granda
The DTTP program has proven to be an enriching experience, particularly through the Co.Lab modules, which have reshaped my perspective.
In Week 1, I gained a crucial insight: prioritizing the problem over the solution. This shift from my previous approach, which centred on fitting a problem into a preconceived solution, highlighted the importance of establishing a solid problem space and statement as the foundation for relevant and useful solutions.
Week 2 delved into data synthesis, emphasizing that the quality of a product is contingent on a deep understanding of customers. Crafting insightful questions revealed trends and behaviours even customers may not be fully aware of.
Week 3 enhanced communication skills, emphasizing that technology's effectiveness relies on seamless team communication. Creating a spec became more than a solution outline; it evolved into a dynamic document, tailored to each team member's understanding.
In Week 4, these learnings culminated in a final assignment, showcasing problem-solving skills, data-backed hypotheses, and effective communication in a condensed pitch.
Co.Lab not only enhanced technical skills but also cultivated essential soft skills like teamwork, communication, time management, and structure.