인스타그램 크롤링 (7) - Beautiful Soup으로 사진 크롤링

Notice

Recent Posts

Recent Comments

Link

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

Note

인스타그램 크롤링 (7) - Beautiful Soup으로 사진 크롤링 본문

etc/Crawling

인스타그램 크롤링 (7) - Beautiful Soup으로 사진 크롤링

알 수 없는 사용자 2022. 7. 10. 21:39

728x90

df = pd.DataFrame()

ccnt = 0

for j in range(10):
    
    if cnt % 3 == 0:
        time.sleep(10)
    ccnt += 1
    
    body = driver.find_element_by_tag_name('body')

    num_of_pagedowns = 5

    while num_of_pagedowns:
        body.send_keys(Keys.PAGE_DOWN)
        time.sleep(3)
        num_of_pagedowns -= 1

    html0 = driver.page_source # 현재 페이지 html
    html = bs(html0,'html.parser')

    picture_info = {}

    cnt = 0

    for i in tqdm(range(0,len(html.find_all('img', {'class' : '_aagt'})))): # 사진 수를 넘지 않는 범위에서 설정 
        try:
            result = {}

            if cnt % 9 == 2:
                time.sleep(3)
            cnt += 1

            # 사진 id
            pic_id = html.find_all('a', {'class' : 'oajrlxb2 g5ia77u1 qu0x051f esr5mh6w e9989ue4 r7d6kgcz rq0escxv nhd2j8a9 nc684nl6 p7hjln8o kvgmc6g5 cxmmr5t8 oygrvhab hcukyx3x jb3vyjys rz4wbd8a qt6c0cv9 a8nywdso i1ao9s8h esuyzwwr f1sip0of lzcic4wl _a6hd'})
            picture_id = pic_id[i]['href'].split('/')[2]
            result['picture_id'] = picture_id


            # 사진 링크
            pic_link = html.find_all('img', {'class' : '_aagt'})
            picture_link = pic_link[i]['src']
            result['picture_link'] = picture_link

        except:
            continue

        result_df = pd.DataFrame.from_dict(picture_info, 'index')
        
    df = pd.concat([df,result_df])
    df = df.drop_duplicates(['picture_id'])
    df = df.reset_index(drop = True)

저작자표시 비영리

'etc > Crawling' 카테고리의 다른 글

인스타그램 크롤링 (8) - 사진 정보 크롤링 (0)	2022.07.13
랜덤 time sleep (0)	2022.07.12
인스타그램 크롤링 (6) - 사진 다운로드 (0)	2022.07.09
인스타그램 크롤링 (5) - picture_id (0)	2022.07.08
인스타그램 크롤링 (4) - like, 해시 태그 (0)	2022.07.07

'etc/Crawling' Related Articles

Comments

Note

인스타그램 크롤링 (7) - Beautiful Soup으로 사진 크롤링 본문

인스타그램 크롤링 (7) - Beautiful Soup으로 사진 크롤링

'etc > Crawling' 카테고리의 다른 글

티스토리툴바