AI Image Validation
X2Earn dApps that send captured photos for image analysis to AI should consider adding specific tasks in their AI prompts to detect:
Image quality
Doctored or Unrealistic modifications
Photo of a computer screen
Watermarks
Painted or hand-drawn text replacing real data
... or other unrealistic items that a "fake" photo could capture.
Example:
In this example we are assuming that we are taking a photo from a dApp runs inside VeWorld and so the photo will be captured by the users mobile camera.
The backend of the app would like the AI to return a json structure giving:
{
"evaluation_feasible": true,
"doctored_unrealistic_score": 0.0,
"doctored_unrealistic_reasons": [],
"screen_capture_score": 0.0,
"screen_capture_reasons": [],
"watermark_score": 0.0,
"watermark_reasons": [],
"watermark_text": "",
"painted_text_score": 0.0,
"painted_text_reasons": [],
"final_label": "clean",
"final_confidence": 0.0
}
Where:
evaluation_feasible
True if the quality of the image is sufficient to perform other checks, False if the quality is poor and other checks could be unreliable
doctored_unrealistic_score doctored_unrealistic_reasons
A score between 0-1 if the AI has detected doctored or unrealistic items in the image. The reasons array will be populated with a summary
screen_capture_score screen_capture_reasons
A score between 0-1 if the AI has detected that the image was taken from a computer screen. The reasons array will be populated with a summary
watermark_score watermark_reasons
A score between 0-1 if the AI has detected that the image contains watermarks.
The reasons array will be populated with a summary
painted_text_score painted_text_reasons
A score between 0-1 if the AI has detected user drawn text in the image. The reasons array will be populated with a summary
final_label final_score
A final classification label of:
clean
doctored_unrealistic
screen_capture
watermarked
handdrawn
multiple_flags
inconclusive
A final score between 0-1 as the level of confidence in the classification
We can use a multi-stage prompt to guide the AI through these tasks:
Mobile Photo Authenticity Check — Multi-Stage Prompt
Objective:
Given a photo provided by a mobile device, evaluate it through multiple analytical stages to determine:
1. If the photo has been doctored or altered in an unrealistic way.
2. If the photo has been taken from a computer screen rather than being an original capture of a real-world scene.
3. If the photo contains visible or partially obscured watermarks.
Instructions:
You must progress through the stages in sequence. At each stage, clearly indicate whether the photo passes or fails, and explain the reasoning.
Output:
Return only the final JSON object in the specified schema—no extra text.
---
STAGE 1 — Quick Triage (visibility & quality):
1. Is the content visible and in focus enough to evaluate?
2. Are there heavy obstructions, extreme blur, or tiny resolution?
3. If evaluation is not feasible, mark `evaluation_feasible=false` and explain briefly.
---
STAGE 2 — “Doctored / Unrealistic” Screening:
Check for visual signs of synthetic or manipulated content. Consider:
- Physics & geometry: inconsistent shadows, impossible reflections, mismatched perspective/vanishing points, warped straight lines near edits.
- Material cues: plastic-like skin, repeated textures, smeared hair/eyelashes, “melting” edges, duplicated fingers/ears.
- Edge artifacts: halos, cut-out borders, fringing, mismatched depth of field.
- Compression anomalies: localized blockiness/quality shifts suggesting pasted regions.
- Lighting: inconsistent color temperature or specular highlights vs. environment.
- Text & patterns: deformed text/logos, repeated tiling.
- Context coherence: scale mismatches, impossible combinations.
Output:
- `doctored_unrealistic_score` (0–1)
- `doctored_unrealistic_reasons` (bullet list)
---
STAGE 3 — “Photo of a Screen” Screening:
Evidence the subject was displayed on a digital screen and re‑photographed:
- Screen structure: visible pixel grid/subpixels, scanlines, PWM/refresh bands, moiré.
- Device clues: bezels, notch, status bar, window chrome, cursor, taskbar, scroll bars.
- Optical clues: rectangular glare, Newton rings, rainbowing consistent with glass.
- Focus/parallax: focus on flat screen surface; keystone perspective of a monitor.
- White point/gamut: uniform backlight glow, overly blue/green whites.
Output:
- `screen_capture_score` (0–1)
- `screen_capture_reasons` (bullet list)
---
STAGE 4 — Watermark / Overlay Detection:
Detect watermarks or ownership/stock overlays that may indicate non-original content or re-use:
- Typical forms: semi-transparent text/logos (“Getty Images”, “Shutterstock”, “Adobe Stock”, creator handles), diagonal repeating patterns, corner logos, date/time stamps that appear composited.
- Visual traits: consistent alpha translucency, uniform repetition across the frame, crisp overlay unaffected by scene lighting/perspective, different resolution/sharpness vs. underlying image.
- Placement: along edges/corners/center diagonals; multiple repeats; patterned tiling.
- Edge cases: legitimate **camera UI overlays** (e.g., timestamp) vs. stock watermarks—distinguish when possible.
- Context: if a watermark is present, note its content (if legible) but **do not identify a person**.
Output:
- `watermark_score` (0–1)
- `watermark_reasons` (bullet list)
- If confidently recognized, add a short `watermark_text` string (e.g., `"Shutterstock"`, `"creator handle @name"`); otherwise empty.
---
STAGE 5 - Detect Hand-Drawn or Painted-On Text:
Detect if the image has text that was manually added using a paint or drawing program, rather than being part of the original scene:
- Look for uneven, non font based handwriting or shapes inconsistent with printed text
- Identify brush strokes, smudging, or digital pen artifacts in the text
- Detect text blending poorly with the image background or overlapping objects unnaturally
- Check for consistent resolution between the text and the rest of the image
Output:
- `painted_text_score` (0-1)
- `painted_text_reasons` (bullet list)
---
STAGE 6 — Final Decision & Confidence:
- `evaluation_feasible`: boolean.
- `final_label`: one of `"clean"`, `"doctored_unrealistic"`, `"screen_capture"`, `"watermarked"`, `handdrawn`, `"multiple_flags"`, `"inconclusive"`.
- `final_confidence` (0–1): overall confidence in `final_label`.
- Keep reasoning concise; cite visible cues only.
---
OUTPUT — JSON Schema (return only this):
{
"evaluation_feasible": true,
"doctored_unrealistic_score": 0.0,
"doctored_unrealistic_reasons": [],
"screen_capture_score": 0.0,
"screen_capture_reasons": [],
"watermark_score": 0.0,
"watermark_reasons": [],
"watermark_text": "",
"painted_text_score": 0.0,
"painted_text_reasons": [],
"final_label": "clean",
"final_confidence": 0.0
}
Last updated
Was this helpful?