IntelのNPUを使ってAI処理をやっていくぞ -言語モデル編-

まえがき

最近はやってみたいことが増えてるのに，なかなか消化が進まない．どうもウグイスです．

今回はその中で，やっとそれなりに動く状態までもっていけたIntelのNPUでのAI処理に関して備忘録としてブログにしていこうと思います．

NPUとは何か

そもそもNPUとはAI処理に特化したアクセラレータ．つまりAI処理を高効率に実行可能な演算装置だと思えば大丈夫です．

CPU内臓という話ではAMDのRyzenが先に出してます．Intelは後発なのですが，軽く触れた感じでは，Intelの方が取っつきやすいと感じました．Intelは割と個人でも簡単に開発環境を整えられるようにしてくれているのが好印象です．

ちなみに今すぐ買う必要があるかというと微妙なラインです．私は開発でおもちゃにする気満々だったので買いましたが，現状だとNPUを使えるソフトウェアの方が少ないはずなので，しばらく待ってもいいと思います．

あとこれはネット記事で見たのですが，次世代の内臓NPUは計算能力が大幅に向上するみたいなので，そっちを買った方がコスパはいいかなと思ったり．

あとNPU関連でいえば，QualcommのSnapdragonEliteX搭載のノートもたぶん夏までには発売されるのでそっちでもいいかなと思ってます．

開発環境の構築

まずは開発環境から作っていきましょう．ただ環境構築自体はめちゃくちゃ簡単です．Pythonを使って環境を構築するだけで問題ありません．

ちなみに私はWindowsで開発を行ってます．さすがにメインで使うノートパソコンをUbuntuにする暴挙をする気にはなりませんでした．

手順は簡単です．

Pythonのインストール
Intel NPU Acceleration Libraryのインストール

以上のステップで開発環境は整います．超簡単です．

まず1番ですが．これはインストーラを使ってインストールするだけなので解説はしません．ネットで調べてください．

2番ですが，コマンドプロンプトとかを開いて以下のコマンドを入れるだけです．

pip install intel-npu-acceleration-library

これでNPUを使う準備は整いました．ではさっそく動かしていきましょう．

サンプルプログラムを動かす

まずGithubに公開されているプログラムを動かしてみましょう

プログラムは以下の通り

from intel_npu_acceleration_library.backend import MatMul
import numpy as np

inC, outC, batch = ... # Define your own values

# Create both inputs
X1 = np.random.uniform(-1, 1, (batch, inC)).astype(np.float16)
X2 = np.random.uniform(-1, 1, (outC, inC)).astype(np.float16)

mm = MatMul(inC, outC, batch, profile=False)

result = mm.run(X1, X2)

Define your own valuesとあるので適当に1,1,2で動かしてみます．結果は以下の通り

[[ 0.04938]
 [-0.2146 ]]

まあこれは問題ないです．では次はllmを動かしてみましょう．サンプルプログラムは以下の通り．

from transformers import AutoTokenizer, TextStreamer, AutoModelForCausalLM
import intel_npu_acceleration_library
import torch

model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

model = AutoModelForCausalLM.from_pretrained(model_id, use_cache=True).eval()
tokenizer = AutoTokenizer.from_pretrained(model_id, use_default_system_prompt=True)
tokenizer.pad_token_id = tokenizer.eos_token_id
streamer = TextStreamer(tokenizer, skip_special_tokens=True)


print("Compile model for the NPU")
model = intel_npu_acceleration_library.compile(model, dtype=torch.int8)

query = input("Ask something: ")
prefix = tokenizer(query, return_tensors="pt")["input_ids"]


generation_kwargs = dict(
    input_ids=prefix,
    streamer=streamer,
    do_sample=True,
    top_k=50,
    top_p=0.9,
    max_new_tokens=512,
)

print("Run inference")
_ = model.generate(**generation_kwargs)

動かすと．初回はモデルのダウンロードが行われます．そして

Ask something: と出てくるので，何かを打ちます．そうすると

    causal_mask = causal_mask[:, :, cache_position, : key_states.shape[-2]]
                  ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: index 4 is out of bounds for dimension 0 with size 1

エラーです．まあなんか次元が違うとかそんな感じのエラーですね．

まあエラーが出たなら調査をしないといけないです．

casual_maskをprintしてみましょう．ソースコードはライブラリの該当箇所を変更します．llm.pyというファイル名でした．ソースコードはこんな感じ

        if causal_mask is not None and cache_position is not None:
            print(causal_mask)
            print(cache_position)
            causal_mask = causal_mask[:, :, cache_position, : key_states.shape[-2]]
            print('-------------------')

それで実行すると

# エラー前
tensor([[[[ 0.0000e+00, -3.4028e+38, -3.4028e+38, -3.4028e+38],
          [ 0.0000e+00,  0.0000e+00, -3.4028e+38, -3.4028e+38],
          [ 0.0000e+00,  0.0000e+00,  0.0000e+00, -3.4028e+38],
          [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00]]]])
tensor([0, 1, 2, 3])

# エラー後
me tensor([[[[-0., -0., -0., -0., -0.]]]])
tensor([4])

まあ明らかですね．形状が違う為指定できないといったところでしょう．

でまあ，対処なんですが，問題の処理はcasual_maskをほかの要素を使って書き換えている処理なので，うまくできない場合は処理を飛ばせばいいかなということで以下の通りに変更しました．

        if causal_mask is not None and cache_position is not None:
            try:
                causal_mask = causal_mask[:, :, cache_position, : key_states.shape[-2]]
            except:
                pass

エラーが出た場合その場で握りつぶすという超雑な実装です．まあこれで動かしてみましょう．

試しに，昔々ある所にを意味する．once upon a timeと入れてみた結果は以下です．中身が正しいかは置いといて，それっぽい文章が生成されているのがわかると思います．

Once upon a time in ancient Egypt

The sun god, Ra, was the powerful and benevolent god who ruled over the land. He was worshipped by all Egyptians as the creator of the universe and the father of the gods.

Ra was known for his wisdom and his gift of the sun. Every day, he would rise up from the Nile River and bring light to the world. He would spread his light over the land and bring about its fertility and abundance.

Ra's people were the priests of the ancient Egyptians, and they were dedicated to the worship of Ra. They built temples, called temples, to honor Ra, and they performed religious rituals to please him.

The priests and their practices were crucial to the religious and cultural life of ancient Egypt. They believed that Ra's blessings and protection were essential for the
 survival of the people.

In ancient Egyptian belief, there were several gods, including Ra, who controlled the elements of life. Other gods and goddesses were also important in the ancient Egyptian religion.

Ra's most famous son was Anubis, the god of mummification and death. He was often depicted as a jackal, and his job was to mummify the bodies of the dead. Anubis was also responsible for the burial of the deceased.

Another important god in ancient Egypt was Amun, the god of the Sun. He was known for his blessings on farmers, who could produce crops with abundance. Amun was also responsible for rain and other natural disasters.

The ancient Egyptians believed that there was a world beyond the physical world, and that it was controlled by the gods. They believed in the afterlife, and they often gave offerings and prayers to the gods in the hopes that they would guide their souls to the afterlife.

Despite their complex and powerful beliefs, ancient Egyptian people faced many challenges. They were vulnerable to disease and famine, which were frequently caused by natural disasters such as floods and earthquakes.

The decline of ancient Egypt began in the New Kingdom (1550-1070 BC), and it ended with the conquests of Alexander the Great in the 330s BC. After Alexander's death, the
 Macedonians

あとがき

というわけで，とりあえずNPUを使ってllmを動かすことができました．まさかライブラリを書き換えないと動かないとは思いませんでしたが，まあver.1.0.0なので仕方ないといえば仕方ないのかもしれない．

ただ日本語を出力させようとしたのですが，あまりに精度がごみだったので何とか精度が上がらないか調整してみようと思ってます．

具体的には複数のモデルを使って比較でもしようかと思ってます．

他のAI処理も試してみたいですね．何か面白いものがないか調べておきます．何か実用的なアプリでも作れたらいいのですが，まあ私にはセンスがないので無理でしょう．

では，ご精読ありがとうございます．ではまた次回お会いしましょう．