유튜브 커뮤니티 텍스트 크롤링

Notice

Recent Posts

Recent Comments

Link

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

Note

유튜브 커뮤니티 텍스트 크롤링 본문

etc/Crawling

유튜브 커뮤니티 텍스트 크롤링

Jun's N 2022. 8. 18. 19:56

# 빈 데이터 프레임 생성

insert_df = pd.DataFrame()
n_counts = 0

# df_channel - 채널id 데이터 프레임

for channel_id in df_channel['channel_id']:
    try:
        video_url = "https://www.youtube.com/channel/{}/community".format(channel_id)
        session = HTMLSession()
        response = session.get(video_url)
        n_counts += 1
        if(response.status_code == 429):
            print(response)
        soup = bs(response.html.html, "html.parser")
        data = re.search(r"var ytInitialData = ({.*?});", soup.prettify()).group(1)
        data_json = json.loads(data)
        commu = data_json['contents']['twoColumnBrowseResultsRenderer']['tabs'][3]['tabRenderer']['content']['sectionListRenderer']['contents'][0]['itemSectionRenderer']['contents']
        text = []
        day = []
        for i in range(0, len(commu)-2):
            if len(commu[i]['backstagePostThreadRenderer']['post']['backstagePostRenderer']['contentText']['runs']) == 1:
                text.append(commu[i]['backstagePostThreadRenderer']['post']['backstagePostRenderer']['contentText']['runs'][0]['text'])
                day.append(commu[i]['backstagePostThreadRenderer']['post']['backstagePostRenderer']['publishedTimeText']['runs'][0]['text'])
            else:
                k = ''
                for j in range(len(commu[i]['backstagePostThreadRenderer']['post']['backstagePostRenderer']['contentText']['runs'])):
                    k += commu[i]['backstagePostThreadRenderer']['post']['backstagePostRenderer']['contentText']['runs'][j]['text']  
                text.append(k)
                day.append(commu[i]['backstagePostThreadRenderer']['post']['backstagePostRenderer']['publishedTimeText']['runs'][0]['text'])
                
        df = pd.DataFrame() # 커뮤니티 텍스트 담을 데이터 프레임
        df['community'] = text # 커뮤니티 텍스트
        df['channel_id'] = channel_id # 채널 id
        df['date'] = day # 시간
        insert_df = pd.concat([insert_df,df]) # 기존에 데이터 프레임에 붙여가면서 저장
    except:
        continue

728x90

저작자표시 비영리 (새창열림)

'etc > Crawling' 카테고리의 다른 글

Youtube 스크립트 (0)	2023.03.23
트위터 api 활용 tweet 가져오기 (1)	2022.09.09
인스타그램 크롤링 (9) - 계정 팔로우, 팔로워, 게시글 (0)	2022.07.14
인스타그램 크롤링 (8) - 사진 정보 크롤링 (0)	2022.07.13
랜덤 time sleep (0)	2022.07.12

'etc/Crawling' Related Articles

Comments

Note

유튜브 커뮤니티 텍스트 크롤링 본문

유튜브 커뮤니티 텍스트 크롤링

'etc > Crawling' 카테고리의 다른 글

티스토리툴바