Skip to main content

BigDL-LLM

BigDL-LLM is a low-bit LLM optimization library on Intel XPU (Xeon/Core/Flex/Arc/Max). It can make LLMs run extremely fast and consume much less memory on Intel platforms. It is open sourced under Apache 2.0 License.

This example goes over how to use LangChain to interact with BigDL-LLM for text generation.

Setup

# Update Langchain

%pip install -qU langchain langchain-community

Install BigDL-LLM for running LLMs locally on Intel CPU.

# Install BigDL
%pip install --pre --upgrade bigdl-llm[all]

Usage

from langchain.chains import LLMChain
from langchain_community.llms.bigdl import BigdlLLM
from langchain_core.prompts import PromptTemplate
template = "USER: {question}\nASSISTANT:"
prompt = PromptTemplate(template=template, input_variables=["question"])

Load Model:

llm = BigdlLLM.from_model_id(
model_id="lmsys/vicuna-7b-v1.5",
model_kwargs={"temperature": 0, "max_length": 64, "trust_remote_code": True},
)
Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
2024-02-23 18:10:22,896 - INFO - Converting the current model to sym_int4 format......
2024-02-23 18:10:25,415 - INFO - BIGDL_OPT_IPEX: False

Use it in Chains:

llm_chain = LLMChain(prompt=prompt, llm=llm)

question = "What is AI?"
output = llm_chain.run(question)
/opt/anaconda3/envs/shane-langchain2/lib/python3.9/site-packages/langchain_core/_api/deprecation.py:117: LangChainDeprecationWarning: The function `run` was deprecated in LangChain 0.1.0 and will be removed in 0.2.0. Use invoke instead.
warn_deprecated(
/opt/anaconda3/envs/shane-langchain2/lib/python3.9/site-packages/transformers/generation/utils.py:1369: UserWarning: Using `max_length`'s default (4096) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.
warnings.warn(
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
AI stands for "Artificial Intelligence." It refers to the development of computer systems that can perform tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and language translation. AI can be achieved through a combination of techniques such as machine learning, natural language processing, computer vision, and robotics. The ultimate goal of AI research is to create machines that can think and learn like humans, and can even exceed human capabilities in certain areas.

Help us out by providing feedback on this documentation page: