Class 30 - CHATBOT FOR DOCUMENTS

Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Class 30 - CHATBOT FOR DOCUMENTS Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Class 30 - CHATBOT FOR DOCUMENTS

Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Today, we are going towards LangChain.

Give yourself margin, give yourself breathing space, focus on one niche, leave rest upto ALLAH.

Direction is very important in Life.

If you have Believe on ALLAH & you are working in same directios, ALLAH will make paths for you.

Perfection come with time.

Successful are those, who have made their decisions & they stand on their decisions.

LangChain is like a bridge for different LLM'S.

We, will make bot with this technology.

Document GPT, we are going to create GPT with documents.

Functionality:

1- User can upload the document

2- He can query anything from the document

3- User can generate summary of the document.

Required Tools:

1- LLM (openai)

2- LangChain

3- Vector database

4- Streamlit

We will make you Practioner, you make to be scientist by yourself.

LangChain is Python Package.

LangChain arrived in Mar 2023.

We have to change our curriculum time to time. Don't define 4 year syllabus. Change it after 6 months for Practical courses.

Client will ask you a question.

What's unique in your product or service ???

LangChain is like a bridge that connects data source with LLM'S.

It is used for

1- Chatbots

2- Answering questions using sources

3- Data augmentation

Vector database is same like database.

Streamlit is Front-End Python Package.

Data Talks to you, if you develops the ability to Listen.

OpenAI:

It is a LLM (Large Language Model)

To avoid the tokens limit exceeding issue, divide the data by yourself or by using LangChain function.

Loader will load the data depends on file type.

After loading we have realized that, data consists of 30k words.

Then, we use splitters to divide the data.

How, our document GPT will work.

Splitting the document into smaller chunks

Convert text chunks into embeddings

Perform a similarity search on the embeddings.

Generate answers to questions using an LLM.

Embeddings are made by vector.

In technology, don't ignore micros, it will create loop holes or dots in yourself, will hurt you in long run.

Vector Database:

Understand this concept.

While having meetings with client, your technical words matters alot.

Avoid overflow information.

Create Vectors from the splitted data, that's why they are called embedding vectors.

Chunks to Embeddings:

Embedding are numerical representations that capture the semantic essence of words, phrases or sentences.

Embeddings Models:

Take words and make vectors.

Embedding models {hugging face and openai}

Hugging face is open-source.

OpenAI is paid.

From Hugging Face, you do alot like {you can get help from HUgging Face for 80-90% of upwork project}. In addition, you have to make some customizations, you have to enough competent till now.

Now, you can start your Journey.

Vector Databases:

FAISS (locally managed)

Elastic Search (locally managed)

Chroma db (locally managed)

Quadrant (managed) {Free or Paid}

Pinecone (managed) {Paid}

#AI #artificialintelligence #datascience #irfanmalik #drsheraz #xevensolutions #openai #chatbot #streamlit #hamzanadeem

To view or add a comment, sign in

More articles by Hamza Nadeem

Insights from the community

Others also viewed

Explore topics