Spaces:

atz21
/

leadib

Sleeping

App Files Files Community

atz21 commited on Sep 1, 2025

Commit

40d9691

verified ·

1 Parent(s): 380638f

Update app.py

Browse files

Files changed (1) hide show

app.py +147 -156

app.py CHANGED Viewed

@@ -2,213 +2,204 @@ import os
 import gradio as gr
 import google.generativeai as genai
 from markdown_pdf import MarkdownPdf, Section
-# -------------------- CONFIG --------------------
-genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
 # ---------- PROMPTS ----------
-TRANSCRIPTION_PROMPT = """Your Role: You are an expert technical transcriber specializing in mathematical and scientific documents. Your mission is to convert handwritten solutions from a provided image or PDF into a clean, accurate, and logically structured Markdown format.
-Primary Objective: Preserve the author's intended solution path while filtering out all mistakes, corrections, and extraneous marks. The final output must be perfectly formatted and easy to follow.
-Core Instructions:
-Hierarchical Structure:
-- Identify all questions and subquestions based on their numbering (e.g., 1. a), i)).
-- Use ## for main questions (e.g., ## Question 1).
-- Use ### for subquestions (e.g., ### a), ### i)).
-- If a question number appears out of its logical sequence, transcribe it with the label provided in the source.
-What to Exclude (Content Filtration):
-- Mistakes: Completely ignore and do not transcribe any number, variable, or expression that has been struck through, scribbled over, or crossed out. Transcribe only the corrected, final version.
-- Extraneous Marks: Do not include any doodles, underlines (unless part of a fraction), or stray marks not relevant to the solution.
-Crucial Distinction: Cancellations vs. Step Cuts:
-- Term Cancellation: This is a valid mathematical step where terms cancel each other out (e.g., +2x and -2x, or a term divided by itself).
-  Action: Transcribe the step where the cancellation occurs. Immediately after that line, add a concise, bracketed note explaining what was cancelled.
-- Step Cut: This is when the author skips intermediate algebraic or arithmetic steps (e.g., jumping from 2b = 2 directly to b = 1).
-  Action: Transcribe the steps exactly as they appear. Do not invent or add the missing steps. The logical jump in the transcribed output serves to represent the step cut.
-Formatting and Special Cases:
-- Equations: Enclose all mathematical equations and multi-line calculations in Markdown code blocks for clarity and proper rendering.
-- Illegibility: If a specific word or number is impossible to read, use the placeholder [illegible].
-- Graphs: Do not attempt to recreate graphs. Instead, describe them textually. Note the type of curve (e.g., parabola, polynomial) and list any labeled key points like intercepts, vertices, or asymptotes.
 """
-# Full 9-rule grading prompt (point 9 is Presentation; "Calculators" section removed)
-GRADING_PROMPT = """Instructions to Examiners
 Abbreviations:
 - M: Marks awarded for attempting to use a correct Method.
 - A: Marks awarded for an Answer or for Accuracy; often dependent on preceding M marks.
 - R: Marks awarded for clear Reasoning.
 - AG: Answer given in the question and so no marks are awarded.
 - FT: Follow through. The practice of awarding marks, despite candidate errors in previous parts, for their correct methods/answers using incorrect results.
 --------------------------------------------
-Using the Markscheme
 ## 1. General
 Award marks using the annotations as noted in the markscheme (e.g., M1, A2).
 ## 2. Method and Answer/Accuracy marks
-- Do not automatically award full marks for a correct answer; all working must be checked, and marks awarded according to the markscheme.
-- It is generally not possible to award M0 followed by A1, as A marks depend on the preceding M mark(s).
-- Where M and A marks are noted on the same line, e.g. M1A1, this usually means:
-  - M1: attempt to use an appropriate method (e.g. substitution into a formula).
-  - A1: correct values used.
-- Where there are two or more A marks on the same line, they may be awarded independently; so if the first value is incorrect, but the next two are correct, award A0A1A1.
-- Where the markscheme specifies A3, M2 etc., do not split the marks unless a note allows it.
-- The response to a “show that” question does not need to restate the AG line, unless a Note makes this explicit.
-- Once a correct answer is seen, ignore further working, even if incorrect, unless that incorrect working is then used in a later part. In such cases, award FT marks as appropriate but withhold the final A1.
 ## 3. Implied marks
-Implied marks appear in brackets, e.g. (M1), and can only be awarded if correct work is seen or implied by subsequent working/answer.
 ## 4. Follow through (FT) marks
-- FT marks are awarded where an incorrect answer from one part is used correctly in later parts.
-- Usually, working must be present. However, if all marks in a part are for the final answer or implied, then FT marks may be given even with no working.
-- Within a question part, once an error is made, no further A marks can be awarded for work using the error, but M marks may still be awarded.
-- If the question becomes simpler due to the error, use discretion to award fewer FT marks.
-- If the error gives an inappropriate value (e.g., probability > 1, non-integer where integer required), do not award the final A mark.
-- If the candidate’s answer clearly contradicts information given in the question, FT marks should not be awarded.
-- Exceptions to these rules will be explicitly noted in the markscheme.
 ## 5. Mis-read (MR)
-- If a candidate copies values incorrectly from the question, this is a misread (MR). Penalize only once for that misread.
-- Do not award the first mark (even if it is an M mark), but award all others as appropriate.
-- If the question becomes much simpler due to MR, use discretion to award fewer marks.
-- If MR leads to an inappropriate value (e.g., probability > 1), do not award the final A mark.
-- Mis-copying their own work is an error, not MR.
-- MR can only be applied when working is seen.
 ## 6. Alternative methods
-- Unless the question specifies a method, other correct methods should be credited.
-- If the command term is “Hence” (and not “Hence or otherwise”), alternative methods are not permitted unless explicitly noted.
-- Alternative methods for whole questions are labeled METHOD 1, METHOD 2, etc.
-- Alternative solutions for parts are labeled EITHER … OR.
 ## 7. Alternative forms
-- Accept equivalent forms unless the question specifies otherwise.
-- Accept international formats (e.g., 1.9 = 1,9 = 1·9; 1000 = 1,000 = 1.000).
-- Do not accept final answers in calculator notation. Intermediate working may use it if it shows the required demand.
-- Equivalent algebraic/numeric answers may appear in brackets in the markscheme, but examiners should use discretion for equivalence.
 ## 8. Format and accuracy of answers
-- If accuracy is specified in the question, marks depend on the answer being given to that accuracy.
-- Otherwise, numerical answers should be exact or correct to 3 significant figures.
-- If values are carried forward, candidates may use the exact value or the correct 3 s.f. version.
-**Simplification rules:**
-- Arithmetic should be simplified where possible.
-  - Example: 25/4 should be written as 6.25.
-  - Example: 10/5 should be written as 2.
-- Fractions do not need to be in lowest terms, but numerator and denominator must be integers.
-- Algebra should be simplified:
-  - Example: 4e × e^x → 4e^(x+1).
-  - Example: x² × x³ × x⁴ → x⁹.
-- Factorized vs. expanded forms: both acceptable unless otherwise specified.
-- Intermediate A marks do **not** need to be simplified.
 ## 9. Presentation of candidate work
-- Crossed-out work: do not mark unless candidate indicates it should be considered.
-- More than one solution: mark only the first response unless candidate specifies otherwise.
 """
-# ---------- HELPER: Save to PDF using markdown-pdf ----------
 def save_as_pdf(text, filename="output.pdf"):
     pdf = MarkdownPdf()
     pdf.add_section(Section(text, toc=False))
     pdf.save(filename)
     return filename
-# ---------- STEP 1: TRANSCRIPTION ----------
-def transcribe(ans_file):
     try:
-        ans_uploaded = genai.upload_file(path=ans_file, display_name="Answer Sheet")
-        model = genai.GenerativeModel("gemini-2.5-pro", generation_config={"temperature": 0})
-        resp = model.generate_content([TRANSCRIPTION_PROMPT, ans_uploaded])
-        transcription = getattr(resp, "text", None)
-        if not transcription and resp.candidates:
-            transcription = resp.candidates[0].content.parts[0].text
-        pdf_path = save_as_pdf(transcription, "transcription.pdf")
-        return transcription, pdf_path
-    except Exception as e:
-        return f"❌ Error during transcription: {e}", None
-# ---------- STEP 2: GRADING ----------
-def grade(qp_file, ms_file, transcription):
-    try:
-        qp_uploaded = genai.upload_file(path=qp_file, display_name="Question Paper")
-        ms_uploaded = genai.upload_file(path=ms_file, display_name="Marking Scheme")
-        model = genai.GenerativeModel("gemini-2.5-pro", generation_config={"temperature": 0})
-        # Prompt that embeds the full 9 rules and enforces a structured grading table
-        structured_instructions = (
-            "You are an official examiner. Use the following grading rules strictly:\n\n"
-            f"{GRADING_PROMPT}\n\n"
-            "OUTPUT FORMAT (use GitHub-flavored Markdown table):\n\n"
-            "| Student wrote | Marks Awarded | Reason (reference the rules; specify error type: A : All Good , B : Silly Mistake , C : Conceptual Error , D : Hard question ,  E : Not Applicable) |\n"
-            "|---|---|---|\n"
-            "Then, after the table, provide a short 'Summary & Final Mark' section with totals and any FT usage noted.\n\n"
-            "Guidelines:\n"
-            "1) Apply marks exactly as per the markscheme.\n"
-            "2) Justify each awarded or withheld mark with explicit references to the numbered rules.\n"
-            "3) Classify all errors (Conceptual Error, Silly Mistake, Misread, or None).\n"
-            "4) Enforce dependency between M and A marks (no A awarded if M not earned) and indicate FT when applied.\n"
-            "5) Do not invent marks that are not present in the markscheme.\n"
-            "6) Provide step-by-step reasoning for each mark awarded or withheld.\n"
-        )
         response = model.generate_content([
-            structured_instructions,
-            qp_uploaded,      # uploaded question paper
-            ms_uploaded,      # uploaded marking scheme
-            transcription     # student's transcription
         ])
         grading = getattr(response, "text", None)
         if not grading and response.candidates:
             grading = response.candidates[0].content.parts[0].text
-        pdf_path = save_as_pdf(grading, "grading.pdf")
-        return grading, pdf_path
     except Exception as e:
-        return f"❌ Error during grading: {e}", None
 # ---------- GRADIO APP ----------
-with gr.Blocks(title="LeadIB AI Grading") as demo:
-    gr.Markdown("## LeadIB AI Grading\nUpload exam documents to transcribe and grade student answers step by step.")
     with gr.Row():
         qp_file = gr.File(label="Upload Question Paper (PDF)", type="filepath")
-        ms_file = gr.File(label="Upload Mark Scheme (PDF)", type="filepath")
         ans_file = gr.File(label="Upload Student Answer Sheet (PDF)", type="filepath")
-    # Step 1: Transcription
-    transcribe_btn = gr.Button("Step 1: Transcribe Answer Sheet")
     with gr.Row():
-        transcription_out = gr.Textbox(label="📄 Student Transcription", lines=20)
-        transcription_pdf = gr.File(label="⬇️ Download Transcription (PDF)")
-    # Step 2: Grading
-    grade_btn = gr.Button("Step 2: Grade the Student")
     with gr.Row():
-        grading_out = gr.Textbox(label="✅ Grading Report (Step-by-Step)", lines=20)
-        grading_pdf = gr.File(label="⬇️ Download Grading (PDF)")
-    # Button Logic
-    transcribe_btn.click(
-        fn=transcribe,
-        inputs=[ans_file],
-        outputs=[transcription_out, transcription_pdf],
-        show_progress=True
-    )
-    grade_btn.click(
-        fn=grade,
-        inputs=[qp_file, ms_file, transcription_out],
-        outputs=[grading_out, grading_pdf],
         show_progress=True
     )

 import gradio as gr
 import google.generativeai as genai
 from markdown_pdf import MarkdownPdf, Section
+import pikepdf
 # ---------- PROMPTS ----------
+PROMPTS = {
+    "ALIGNMENT_PROMPT": {
+        "role": "system",
+        "content": """Your Role: You are an expert examiner and transcription specialist.
+Your task is to **align three sources**:
+- Question Paper (QP)
+- Markscheme (MS)
+- Student Answer Sheet (AS)
+### Instructions
+1. Parse all documents carefully and align them **per question and sub-question**.
+2. For each question/sub-question, produce a structured block:
+---
+## Question X (and sub-question if applicable)
+**QP:** [Insert the exact question text]
+**MS:** [Insert the relevant part of the markscheme]
+**AS:** [Insert the student's final cleaned answer transcription]
+---
+3. Formatting Rules:
+- Use `##` for main questions and `###` for sub-questions.
+- Write **QP | MS | AS** exactly in that order.
+- Preserve all mathematical expressions inside fenced code blocks.
+- Do not re-create diagrams/graphs. Write `[Graph omitted]`.
+- If part of the student's answer is unreadable, write `[illegible]`.
+- If a student skipped a question, write `[No response]`.
+- Keep MS annotations (M1, A1, R1, etc.) exactly as in the original.
+4. Output must be **clean, deterministic, and consistent** — so that another model can grade directly using this aligned representation.
+### Example
+## Question 1
+**QP:** Expand `(1+x)^3`
+**MS:** M1 for binomial expansion, A1 for coefficients, A1 for final form
+**AS:**
 """
+    },
+    "GRADING_PROMPT": {
+        "role": "system",
+        "content": """You are an official examiner. Use the following grading rules strictly.
 Abbreviations:
 - M: Marks awarded for attempting to use a correct Method.
 - A: Marks awarded for an Answer or for Accuracy; often dependent on preceding M marks.
 - R: Marks awarded for clear Reasoning.
 - AG: Answer given in the question and so no marks are awarded.
 - FT: Follow through. The practice of awarding marks, despite candidate errors in previous parts, for their correct methods/answers using incorrect results.
 --------------------------------------------
 ## 1. General
 Award marks using the annotations as noted in the markscheme (e.g., M1, A2).
 ## 2. Method and Answer/Accuracy marks
+- Do not automatically award full marks for a correct answer; all working must be checked.
+- It is generally not possible to award M0 followed by A1.
+- Where M and A marks are noted on the same line (M1A1), M is for method, A is for accuracy.
+- Multiple A marks can be independent.
 ## 3. Implied marks
+Implied marks (M1) can only be awarded if correct work is seen or implied.
 ## 4. Follow through (FT) marks
+- Award FT if an earlier wrong answer is used consistently later.
+- Do not award FT if the result contradicts the question (e.g., probability > 1).
 ## 5. Mis-read (MR)
+- Penalize once if the candidate misreads a value.
+- Award other marks as appropriate.
 ## 6. Alternative methods
+- Accept valid alternatives unless "Hence" forbids it.
 ## 7. Alternative forms
+- Accept equivalent numeric/algebraic forms unless specified otherwise.
 ## 8. Format and accuracy of answers
+- Use correct accuracy (3 s.f. if not specified).
+- Arithmetic and algebra should be simplified.
 ## 9. Presentation of candidate work
+- Ignore crossed-out work unless indicated.
+- Mark only the first solution unless candidate specifies otherwise.
+## 10. Graph/Diagram Questions
+- If a question requires drawing or interpreting a graph/diagram, assume the student has done it correctly and award full marks for that part.
+--------------------------------------------
+### OUTPUT FORMAT
+Produce a GitHub-flavored Markdown table with 3 columns:
+| Student wrote | Marks Awarded | Reason |
+|---------------|---------------|--------|
+Special Formatting Rule:
+- Whenever a mark is lost (M0, A0, R0 etc.), wrap it in red using: `<span style="color:red">M0</span>`.
+- Also wrap the corresponding Reason in red color.
+- Keep awarded marks (M1, A1, etc.) in plain text.
+- If mixed (e.g., M1A0A1), only highlight the lost marks (`A0`) and its reason.
+After the table, provide:
+### Summary & Final Mark
+- Total marks obtained vs total available
+- Any FT (follow-through) applied
+- Classification of errors (Conceptual, Silly mistake, Misread, etc.)
 """
+    }
+}
+# -------------------- CONFIG --------------------
+genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
+# ---------- HELPER: Save to PDF ----------
 def save_as_pdf(text, filename="output.pdf"):
     pdf = MarkdownPdf()
     pdf.add_section(Section(text, toc=False))
     pdf.save(filename)
     return filename
+# ---------- HELPER: PDF Compression ----------
+def compress_pdf(input_path, output_path):
     try:
+        pdf = pikepdf.open(input_path)
+        pdf.save(output_path, optimize_version=True)
+        pdf.close()
+        return output_path
+    except Exception as e:
+        print(f"❌ Compression failed for {input_path}: {e}")
+        return input_path
+def check_and_compress(path):
+    if os.path.getsize(path) > 20 * 1024 * 1024:  # 20 MB
+        compressed_path = os.path.splitext(path)[0] + "_compressed.pdf"
+        print(f"⚡ Compressing {os.path.basename(path)} (>20MB)...")
+        return compress_pdf(path, compressed_path)
+    return path
+# ---------- HELPER: Create Model with Fallback ----------
+def create_model():
+    try:
+        print("⚡ Using gemini-2.5-pro model")
+        return genai.GenerativeModel("gemini-2.5-pro", generation_config={"temperature": 0})
+    except Exception:
+        print("⚡ Falling back to gemini-2.5-flash model")
+        return genai.GenerativeModel("gemini-2.5-flash", generation_config={"temperature": 0})
+# ---------- PIPELINE: ALIGN + GRADE ----------
+def align_and_grade(qp_file, ms_file, ans_file):
+    try:
+        # Ensure files are <20MB
+        qp_file = check_and_compress(qp_file)
+        ms_file = check_and_compress(ms_file)
+        ans_file = check_and_compress(ans_file)
+        # Uploads
+        qp_uploaded = genai.upload_file(path=qp_file, display_name="Question Paper")
+        ms_uploaded = genai.upload_file(path=ms_file, display_name="Markscheme")
+        ans_uploaded = genai.upload_file(path=ans_file, display_name="Answer Sheet")
+        model = create_model()
+        # Step 1: Alignment
+        resp = model.generate_content([
+            PROMPTS["ALIGNMENT_PROMPT"]["content"],
+            qp_uploaded,
+            ms_uploaded,
+            ans_uploaded
+        ])
+        aligned_text = getattr(resp, "text", None)
+        if not aligned_text and resp.candidates:
+            aligned_text = resp.candidates[0].content.parts[0].text
+        aligned_pdf_path = save_as_pdf(aligned_text, "aligned_qp_ms_as.pdf")
+        # Step 2: Grading (automatic)
         response = model.generate_content([
+            PROMPTS["GRADING_PROMPT"]["content"],
+            aligned_text
         ])
         grading = getattr(response, "text", None)
         if not grading and response.candidates:
             grading = response.candidates[0].content.parts[0].text
+        # Save grading report with student's answer filename
+        base_name = os.path.splitext(os.path.basename(ans_file))[0]
+        grading_pdf_path = save_as_pdf(grading, f"{base_name}_graded.pdf")
+        return aligned_text, aligned_pdf_path, grading, grading_pdf_path
     except Exception as e:
+        return f"❌ Error: {e}", None, None, None
 # ---------- GRADIO APP ----------
+with gr.Blocks(title="LeadIB AI Grading (Alignment + Auto-Grading)") as demo:
+    gr.Markdown("## LeadIB AI Grading\nUpload Question Paper, Markscheme, and Student Answer Sheet.\nThe system will align and grade automatically.")
     with gr.Row():
         qp_file = gr.File(label="Upload Question Paper (PDF)", type="filepath")
+        ms_file = gr.File(label="Upload Markscheme (PDF)", type="filepath")
         ans_file = gr.File(label="Upload Student Answer Sheet (PDF)", type="filepath")
+    run_btn = gr.Button("Start Alignment + Auto-Grading")
     with gr.Row():
+        aligned_out = gr.Textbox(label="📄 Aligned QP | MS | AS", lines=20)
+        aligned_pdf = gr.File(label="⬇️ Download Aligned (PDF)")
     with gr.Row():
+        grading_out = gr.Textbox(label="✅ Grading Report", lines=20)
+        grading_pdf = gr.File(label="⬇️ Download Grading Report (PDF)")
+    run_btn.click(
+        fn=align_and_grade,
+        inputs=[qp_file, ms_file, ans_file],
+        outputs=[aligned_out, aligned_pdf, grading_out, grading_pdf],
         show_progress=True
     )