RNN을 이용한 텍스트 생성 2

Notice

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

Note

RNN을 이용한 텍스트 생성 2 본문

Deep Learning/NLP

RNN을 이용한 텍스트 생성 2

알 수 없는 사용자 2022. 5. 25. 00:00

728x90

모델 설계

임베딩 벡터 차원 : 10

은닉 상태의 크기 : 32

해당 모델은 마지막 시점에서 모든 가능한 단어 중 하나의 단어를 예측하는 다중 클래스 분류 문제를 수행하는 모델

따라서 다중 클래스 분류 문제는 활성화 함수로 소프트맥스 함수를 사용한다.

손실 함수는 크로스 엔트로피 함수 사용.

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Dense, SimpleRNN

embedding_dim = 10
hidden_units = 32

model = Sequential()
model.add(Embedding(vocab_size, embedding_dim))
model.add(SimpleRNN(hidden_units))
model.add(Dense(vocab_size, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=200, verbose=2)

# 모델 함수 생성

def sentence_generation(model, tokenizer, current_word, n): # 모델, 토크나이저, 현재 단어, 반복할 횟수
    init_word = current_word
    sentence = ''

    # n번 반복
    for _ in range(n):
        # 현재 단어에 대한 정수 인코딩과 패딩
        encoded = tokenizer.texts_to_sequences([current_word])[0]
        encoded = pad_sequences([encoded], maxlen=5, padding='pre')
        # 입력한 X(현재 단어)에 대해서 Y를 예측하고 Y(예측한 단어)를 result에 저장.
        result = model.predict(encoded, verbose=0)
        result = np.argmax(result, axis=1)

        for word, index in tokenizer.word_index.items(): 
            # 만약 예측한 단어와 인덱스와 동일한 단어가 있다면 break
            if index == result:
                break

        # 현재 단어 + ' ' + 예측 단어를 현재 단어로 변경
        current_word = current_word + ' '  + word

        # 예측 단어를 문장에 저장
        sentence = sentence + ' ' + word

    sentence = init_word + sentence
    return sentence

print(sentence_generation(model, tokenizer, '경마장에', 4)) # 경마장에 있는 말이 뛰고 있다
print(sentence_generation(model, tokenizer, '그의', 2)) # 그의 말이 법이다
print(sentence_generation(model, tokenizer, '가는', 5)) # 가는 말이 고와야 오는 말이 곱다

이상의 숫자를 주면 '있다', '법이다', '곱다' 다음에 나오는 단어가 무엇인지 배운 적이 없으므로 임의 예측을 한다.

저작자표시 비영리

'Deep Learning > NLP' 카테고리의 다른 글

KaKao KoGPT (0)	2023.01.26
한글, 영어 구분 함수 (0)	2022.06.16
RNN을 이용한 텍스트 생성 1 (0)	2022.05.24
Py-Hanspell (0)	2022.05.15
PyKoSpacing (0)	2022.05.11

'Deep Learning/NLP' Related Articles

Comments

Note

RNN을 이용한 텍스트 생성 2 본문

RNN을 이용한 텍스트 생성 2

'Deep Learning > NLP' 카테고리의 다른 글

티스토리툴바