![]() When handling larger documents, we need to define how to split the document into smaller pieces. For text generation, we use Cohere’s Medium model, and we use GPT-J for embeddings, both via JumpStart. Although this model handles documents of up to 10,000 words (approximately 40 pages), we use langchain’s text splitter to make sure that each summarization call to the LLM is no more than 10,000 words long. When the summarization is done, the front-end application can pick up the results from an Amazon DynamoDB table.įor summarization, we use AI21’s Summarize model, one of the foundation models available through Amazon SageMaker JumpStart. We use a Fargate task here because summarizing a very long PDF may take more time and memory than a Lambda function has available. ![]() The Fargate task calls the Amazon SageMaker inference endpoint. Another Lambda function picks up that message and starts an Amazon Elastic Container Service (Amazon ECS) AWS Fargate task. For example, the call to summarize a document invokes a Lambda function that posts a message to an Amazon Simple Queue Service (Amazon SQS) queue. When that job is done, you can invoke an API that summarizes the text or answers questions about it.īecause some of these steps may take some time, the architecture uses a decoupled asynchronous approach. As part of the post-processing, an AWS Lambda function inserts special markers into the text indicating page boundaries. After the upload is complete, you can trigger a text extraction job powered by Amazon Textract. The front-end application lets users upload PDF documents to Amazon S3. It uses the retrieval augmented generation technique to let users ask questions about new data that the LLM hasn’t seen beforeĪs shown in the following diagram, we use a front end implemented with React JavaScript hosted in an Amazon Simple Storage Service (Amazon S3) bucket fronted by Amazon CloudFront.It uses the langchain library to split a large PDF into more manageable chunks. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |