Nvidia-Ingest: Multi-modal data extraction

github.com

145 points by mihaid150 5 days ago


hammersbald - 4 days ago

Is there a OCR toolkit or a ML Model which is able to reliable extract tables from invoices?

ixaxaar - 4 days ago

Ah so like NIM is a set of microservices on top of various models, and this is another set of microservices using NIM microservices to do large scale OCR?

and that too integrated with prometheus, 160GB VRAM requirement and so on?

Looks like this is targeted for enterprises or maybe governments etc trying to digitalize at scale.

- 4 days ago
[deleted]
greatgib - 5 days ago

I have hard time to understand what they mean by "early access micro services"...?

Does it mean that it is yet another wrapper library to call they proprietary cloud api?

Or that when you have the specific access right, you can retrieve a proprietary docker image with secret proprietary binary stuffs inside that will be the server used by the library available in GitHub?

joeevans1000 - 4 days ago

How is this different than elasticsearch and solr? That’s not any kind of challenging question… I really don’t know that much about these different tools and I just want to know what this one is about.

Also: I noticed that it mentioned images… does it do any kind of OCR or summary of them?

PeterStuer - 4 days ago

Before you get too exited, this needs 2 A100 or H100's minimum.

OutOfHere - 4 days ago

This requires Nvidia GPUs to run.

The open question is whether to use rule-based parsing using simpler software or model-based parsing using this software.

lyime - 4 days ago

So who is going to deploy this and turn this into a service/API?

UltraSane - 4 days ago

What is the effective $/document of this method?

wiradikusuma - 4 days ago

Is this like Nvidia version of MCP? (https://modelcontextprotocol.io/introduction)

joaquincabezas - 5 days ago

lol, while checking which OCR is using (PaddleOCR) I found a line with the text: "TODO(Devin)" and was pretty excited thinking they were already using Devin AI...

"Devin Robison" is the author of the package!! Funny, guess it will be similar with the name Alexa

vardump - 5 days ago

Sounds pretty useful. What are the system requirements?

  Prerequisites
  Hardware
  GPU Family Memory # of GPUs (min.)
  H100 SXM or PCIe 80GB 2
  A100 SXM or PCIe 80GB 2
Hmm, perhaps this is not for me.
shutty - 5 days ago

Wow, I perhaps need a kubernetes cluster just for a demo:

    CONTAINER ID   IMAGE                                                    
    0f2f86615ea5   nvcr.io/ohlfw0olaadg/ea-participants/nv-ingest:24.10     
    de44122c6ddc   otel/opentelemetry-collector-contrib:0.91.0              
    02c9ab8c6901   nvcr.io/ohlfw0olaadg/ea-participants/cached:0.2.0        
    d49369334398   nvcr.io/nim/nvidia/nv-embedqa-e5-v5:1.1.0                
    508715a24998   nvcr.io/ohlfw0olaadg/ea-participants/nv-yolox-structured-images-v1:0.2.0
    5b7a174a0a85   nvcr.io/ohlfw0olaadg/ea-participants/deplot:1.0.0                                                                     
    430045f98c02   nvcr.io/ohlfw0olaadg/ea-participants/paddleocr:0.2.0                                                                  
    8e587b45821b   grafana/grafana                                                         
    aa2c0ec387e2   redis/redis-stack                                                       
    bda9a2a9c8b5   openzipkin/zipkin                                                       
    ac27e5297d57   prom/prometheus:latest
foxhop - 4 days ago

[dead]

jappgar - 4 days ago

Nvidia getting in on the lucrative gpt-wrapper market.