{"id":82871,"title":"Reducto - Structured data from any unstructured document","tagline":"Reducto’s new API makes it easy to turn complex documents and spreadsheets into structured data that fits your schema, with zero fine tuning. ","body":"![uploaded image](/media/?type=post\u0026id=82871\u0026key=user_uploads/850889/1ce99361-5539-4e75-809d-d432d24470a2)\n\n[Reducto](https://reducto.ai) is a team from MIT building vision models to turn complex documents into LLM-ready inputs. Every part of our core API was made to offer the most accurate extraction possible, and we now power document ingestion for leading AI teams ranging from startups to Fortune 10 enterprises. \\\n\\\nWe’re excited to release a new **Structured Extraction API**, which leverages LLMs along with our vision models to extract the data that you need with exceptional accuracy and flexibility. \n\n# **📃 The Problem**\n\n**Nearly 80% of enterprise data is in unstructured formats like PDFs and spreadsheets**\\\nPDFs are the status quo for enterprise knowledge in nearly every industry. They’re stored in a structure that’s simply impractical for use in digital workflows, which leads to [dozens of wasted hours every week](https://www.reducto.ai/blog/the-real-cost-of-manual-document-processing).\\\n\\\n**Custom extraction models require hundreds of hours to build and maintain**\n\nCompanies often use traditional solutions to build a custom extraction pipeline for _every_ document layout they’re working with. That requires dozens of hours for labeling and training the model, and constant maintenance when models break from changing layouts.\\\n\\\n**LLMs offer better flexibility at the cost of reliability**\n\nOff the shelf LLMs _can_ offer exceptional reasoning but they struggle with hallucinations and extraction inaccuracies, making them unreliable for production use cases. \n\n# \n\n# **✅ Our Solution**\n\nWe’ve built vision models to read documents the way a human would, and a language model that we trained for schema-based extraction. Our new model can handle significantly larger documents, and is trained to _cite_ the source for each piece of information, allowing you to audit and verify outputs easily.\n\n![uploaded image](/media/?type=post\u0026id=82871\u0026key=user_uploads/850889/b090586c-1014-4293-a5f3-266950b46338)\n\nThis means you can:\n\n* Extract important fields with simple, natural language instructions\n* Verify any information using our source citations\n* Build powerful automations by integrating Reducto’s API with your custom workflow\n\n# **🚀 Automate your unstructured data processing**\n\nOur API is live in production with leading teams across insurance, healthcare, and finance, and we would love to work with you to improve your unstructured data ingestion.\\\n\\\nThis new API leverages all of the work that we’ve put into improving our document understanding models to make structured extraction work across all layouts with best in class accuracy. \\\n\\\nYou can [sign up](https://reducto.ai/pricing) to get started right away, or reach out to us at [founders@reducto.ai](mailto:founders@reducto.ai) for a demo!","slug":"LYd-reducto-structured-data-from-any-unstructured-document","created_at":"2024-08-09T14:58:47.814Z","updated_at":"2026-07-22T00:27:59.770Z","total_vote_count":28,"url":"https://www.ycombinator.com/launches/LYd-reducto-structured-data-from-any-unstructured-document","share_image_url":"https://www.ycombinator.com/media/?type=post\u0026id=82871\u0026key=user_uploads/850889/1ce99361-5539-4e75-809d-d432d24470a2","company":{"id":29254,"name":"Reducto","slug":"reducto","url":"https://reducto.ai","logo":"https://bookface-images.s3.amazonaws.com/small_logos/9794d4fb523375d125992ce9fa5e9e9db45daf9a.png","batch":"Winter 2024","industry":"B2B","tags":["Documents","Data Engineering","Enterprise Software","Search","AI"],"search_path":"https://bookface.ycombinator.com/company/29254"}}