Sample Queries (Optional)
Provide sample queries to demonstrate the types of questions your chatbot should address, ensuring compatibility with the database schema. Use the template below to guide your query formulation.
Blueprint for Chatbot Prompt Templates:
1. **Intent Classification Prompt**:
- **Purpose**: Classify the user’s query to determine its intent and relevance to the database.
- **Structure**:
- Input variables: user_query, available_databases_summary, database_content_summary, conversation_summary
- Template:
- Describe the AI’s role as a specialized chatbot for your database’s domain (e.g., plant ncRNAs or fusion transcripts).
- Specify the databases served (e.g., MyDatabase).
- Instruct the AI to interpret queries like “what data do you have on [topic]?” as data_preview requests.
- List classification categories: METADATA_DIRECT_ANSWER_PREFERRED, METADATA_SQL_FALLBACK_PREFERRED, DATA_RETRIEVAL, AMBIGUOUS, TYPO, GENERAL_CONVERSATION, OUT_OF_SCOPE.
- Define each category with examples relevant to your database’s content.
- Output only the classification string.
- **Example Usage**: “Classify whether a query like ‘List fusion transcripts in rice’ is a data retrieval request or if ‘What is the time?’ is out of scope.”
2. **SQL Planning Prompt**:
- **Purpose**: Generate SQLite-compatible SQL queries or metadata responses based on the query’s intent.
- **Structure**:
- Input variables: user_query, classified_intent, db_file_name, schemas_json, knowledge_graph, database_table_mapping_info, DISPLAY_ROW_LIMIT, retrieved_knowledge_context, conversation_summary, previous_query_summary
- Template:
- Define the AI’s role as a specialized chatbot for your database.
- Specify that output must be a JSON object wrapped in and .
- Provide database overview (e.g., key columns like gene1, gene2, species).
- Outline rules for handling query types (e.g., gene ID queries check multiple tables, expression queries target specific columns).
- Include SQL best practices (e.g., quote table/column names, use single quotes for strings).
- Define JSON structure: query_type, analysis_plan, direct_answer_from_metadata, queries (with sql, target_table, database_conceptual_name, description, purpose_type, display_columns_hint).
- **Example Usage**: “Generate SQL to retrieve fusion transcripts for gene AT1G12345 from MyDatabase.”
3. **Summary Interpretation Prompt**:
- **Purpose**: Rephrase system-generated data summaries into user-friendly, conversational responses.
- **Structure**:
- Input variables: user_query, analysis_plan_from_stage1, textual_data_summary_from_python, knowledge_graph_snippet, conversation_summary, query_type_from_stage1, DISPLAY_ROW_LIMIT
- Template:
- Specify the AI’s role and database focus.
- Instruct to output a JSON object with “summary” and “databases_conceptually_involved” fields.
- Require the summary to be based solely on the system-generated data summary, using conceptual database names.
- Define rules for handling data previews, aggregations, errors, and common item analyses (e.g., normalize species like ‘oryza sativa japonica’ to ‘oryza sativa’).
- Emphasize professional tone, replacing underscores with spaces for readability.
- **Example Usage**: “Rephrase a system summary stating ‘Found 150 fusion transcripts’ into: ‘MyDatabase identified 150 fusion transcripts for your query.’”
**Instructions**:
- Use this blueprint to create prompts tailored to your database’s schema and content.
- Ensure prompts handle your database’s specific data types (e.g., lncRNAs, fusion transcripts) and unsupported queries (e.g., siRNAs).
- Test prompts with your sample queries to verify accurate intent classification and data retrieval.