Why not both? Have an overall process run it through OCR, run it through a VLM, diff the outputs, embed confidence in metadata and link to the source? I do think we need to stop thinking any process ...
If you have ever found yourself spending hours sifting through piles of PDFs, DOCX files, and CSVs, manually extracting the data you need. It’s tedious, right? I’ve been there, and I know how ...