[crawler] naver cafe_게시판 글 목록

[crawler] naver cafe_게시판 글 목록

2021. 5. 19. 19:17ㆍPython/코딩

Ver. Jupyter Notebook (Anaconda3)

▶ crawler_naver cafe_게시판 글 목록

수집: 글 번호, 제목, 작성자, 좋아요

코딩: github

JeongJaeyoung0/crawler

Contribute to JeongJaeyoung0/crawler development by creating an account on GitHub.

github.com

2021.05.19

# crawler_naver cafe_게시판 글 목록
Step 1. 네이버 카페 > 게시판 > 글번호, 제목, 작성자, 좋아요 수집 > 저장 (crawler_naver cafe_게시판 {}.xlsx)

pwd

# 공지 숨겨도 크롤링에 포함됨
# 공지 숨기기 클릭
# anno_off = driver.find_element_by_css_selector('.check_box').click()

### Step 0. 준비
import sys    # 시스템
import os     # 시스템

import pandas as pd    # 판다스 : 데이터분석 라이브러리
import numpy as np     # 넘파이 : 숫자, 행렬 데이터 라이브러리

from bs4 import BeautifulSoup     # html 데이터 전처리
from selenium import webdriver    # 웹 브라우저 자동화
import time                       # 시간 지연
import math

### Step 1. 크롤링
keyword = "☰ 통합 Q & A ☰"
crawling_no = int(input('클롤링 할 글 개수를 입력 :'))

# 크롬 웹브라우저 실행
driver = webdriver.Chrome(r"G:\내 드라이브\exe\chromedriver.exe")

# 사이트 주소
driver.get("https://cafe.naver.com/noljatravel")
time.sleep(2)

# 게시판 클릭
driver.find_element_by_link_text(keyword).click()

# 게시판 프레임 접근
driver.switch_to.frame("cafe_main")

# 게시글 50개씩
driver.find_element_by_css_selector("#listSizeSelectDiv").click()
driver.find_element_by_xpath("/html/body/div[1]/div/div[3]/div/div[3]/ul/li[7]/a").click()

#crawling_list = []
no_app = []
title_app = []
nick_app  = []
like_app  = []

# 크롤링 해야 할 페이지 계산
crawling_page = int(math.ceil(crawling_no / 50)+1)

try: 
    for page in range(1,crawling_page):
        # 페이지 클릭
        driver.find_element_by_link_text(str(page)).click()
        time.sleep(1)
        # 글 번호 수집
        no = [i.text for i in driver.find_elements_by_css_selector('.td_article')]
        no_split = [ni.split()[0] for ni in no]
        # 글 제목 수집
        title = [i.text for i in driver.find_elements_by_css_selector('.article')]
        # 작성자 수집
        nick = [i.text for i in driver.find_elements_by_css_selector('.p-nick .m-tcol-c')]
        # 좋아요 수집
        like = [i.text for i in driver.find_elements_by_css_selector('.td_likes')]
        # 수집 데이터 append
        no_app.append(no_split)
        title_app.append(title)
        nick_app.append(nick)
        like_app.append(like)
        # 10페이지 마다 프린트 & 다음 페이지로 클릭
        if str(page)[-1] == '0':
            print(int(page), 'page 크롤링 완료')
            driver.find_element_by_link_text('다음').click()
# 더이상 페이지가 존재하지 않을 시
except:
    print('더이상 페이지가 존재하지 않음')

driver.close()
    
# 리스트안 리스트 분해
no_list = sum(no_app, [])
title_list = sum(title_app, [])
nick_list = sum(nick_app, [])
like_list = sum(like_app, [])

# 판다스화
df = pd.DataFrame({'번호':no_list,
                   '제목':title_list,
                   '작성자':nick_list,
                   '좋아요':like_list})
# 필독, 공지 삭제
df = df.drop(df[df['번호'] == '필독'].index)
df = df.drop(df[df['번호'] == '공지'].index)
df = df.reset_index(drop=True)

print('글 ', len(df), '개 크롤링 완료. \n크롤링 종료.', sep='')

df

# 저장
df.to_excel('crawler_naver cafe_게시판 {}.xlsx'.format(keyword))

저작자표시

'Python > 코딩' 카테고리의 다른 글

[python] 소수 판별 (에라토스테네스의 체) (0)	2021.06.03
[wordcloud] naver cafe_게시판 글 목록 (0)	2021.05.20
[python] crawler_google image (0)	2021.05.18
[crawler] youtube(selenium) (1)	2021.05.17
[wordcloud] kakao_talk (0)	2021.05.15

태그

최근글

댓글

공지사항

아카이브

▶ crawler_naver cafe_게시판 글 목록

'Python > 코딩' 카테고리의 다른 글

관련글

티스토리툴바