Nvidia-Ingest: Multi-modal data extraction

github.com

145 points by mihaid150 6 months ago


hammersbald - 6 months ago

Is there a OCR toolkit or a ML Model which is able to reliable extract tables from invoices?

ixaxaar - 6 months ago

Ah so like NIM is a set of microservices on top of various models, and this is another set of microservices using NIM microservices to do large scale OCR?

and that too integrated with prometheus, 160GB VRAM requirement and so on?

Looks like this is targeted for enterprises or maybe governments etc trying to digitalize at scale.

- 6 months ago
[deleted]
greatgib - 6 months ago

I have hard time to understand what they mean by "early access micro services"...?

Does it mean that it is yet another wrapper library to call they proprietary cloud api?

Or that when you have the specific access right, you can retrieve a proprietary docker image with secret proprietary binary stuffs inside that will be the server used by the library available in GitHub?

joeevans1000 - 6 months ago

How is this different than elasticsearch and solr? That’s not any kind of challenging question… I really don’t know that much about these different tools and I just want to know what this one is about.

Also: I noticed that it mentioned images… does it do any kind of OCR or summary of them?

PeterStuer - 6 months ago

Before you get too exited, this needs 2 A100 or H100's minimum.

OutOfHere - 6 months ago

This requires Nvidia GPUs to run.

The open question is whether to use rule-based parsing using simpler software or model-based parsing using this software.

lyime - 6 months ago

So who is going to deploy this and turn this into a service/API?

UltraSane - 6 months ago

What is the effective $/document of this method?

wiradikusuma - 6 months ago

Is this like Nvidia version of MCP? (https://modelcontextprotocol.io/introduction)

joaquincabezas - 6 months ago

lol, while checking which OCR is using (PaddleOCR) I found a line with the text: "TODO(Devin)" and was pretty excited thinking they were already using Devin AI...

"Devin Robison" is the author of the package!! Funny, guess it will be similar with the name Alexa

vardump - 6 months ago

Sounds pretty useful. What are the system requirements?

  Prerequisites
  Hardware
  GPU Family Memory # of GPUs (min.)
  H100 SXM or PCIe 80GB 2
  A100 SXM or PCIe 80GB 2
Hmm, perhaps this is not for me.
shutty - 6 months ago

Wow, I perhaps need a kubernetes cluster just for a demo:

    CONTAINER ID   IMAGE                                                    
    0f2f86615ea5   nvcr.io/ohlfw0olaadg/ea-participants/nv-ingest:24.10     
    de44122c6ddc   otel/opentelemetry-collector-contrib:0.91.0              
    02c9ab8c6901   nvcr.io/ohlfw0olaadg/ea-participants/cached:0.2.0        
    d49369334398   nvcr.io/nim/nvidia/nv-embedqa-e5-v5:1.1.0                
    508715a24998   nvcr.io/ohlfw0olaadg/ea-participants/nv-yolox-structured-images-v1:0.2.0
    5b7a174a0a85   nvcr.io/ohlfw0olaadg/ea-participants/deplot:1.0.0                                                                     
    430045f98c02   nvcr.io/ohlfw0olaadg/ea-participants/paddleocr:0.2.0                                                                  
    8e587b45821b   grafana/grafana                                                         
    aa2c0ec387e2   redis/redis-stack                                                       
    bda9a2a9c8b5   openzipkin/zipkin                                                       
    ac27e5297d57   prom/prometheus:latest
foxhop - 6 months ago

[dead]

jappgar - 6 months ago

Nvidia getting in on the lucrative gpt-wrapper market.