네이버 뉴스에서 기사 리스트 출력하기
연습문제) 하기 사이트를 파싱해서 텍스트를 뽑아보기
조건)
- 사이트주소 : http://new-collar.kr/data_set/sample_text.html
결과)
A society is a group of individuals involved in persistent social interaction or a large social group sharing the same spatial or social territory, typically subject to the same political authority and dominant cultural expectations. Societies are characterized by patterns of relationships (social relations) between individuals who share a distinctive culture and institutions; a given society may be described as the sum total of such relationships among its constituent members. In the social sciences, a larger society often exhibits stratification or dominance patterns in subgroups.
코드)
import requests as req
from bs4 import BeautifulSoup as bs
url = 'http://new-collar.kr/data_set/sample_text.html'
page = req.get(url)
soup = bs(page.text, 'html.parser')
info = soup.select('p.pb-3.mb-0.small.lh-sm.border-bottom')
text = info[0].get_text() # list 형태로 되어있으면 for문을 돌려야할 수도 있다. 확인해보자.
text
문제) 네이버뉴스에서 아래 조건에 맞는 기사리스트를 출력하기
조건)
- 사용모듈: requests, beautifulsoup
- 검색어 : 손흥민
결과)
[('\'인종차별\' 재차 사과한 벤탄쿠르 "손흥민과 해결…오해였다"',
'https://www.hankyung.com/article/2024062243047'),
('\'손흥민 인종차별\' 동료 두번째 사과문 "손과 대화, 함께 해결"',
'https://www.joongang.co.kr/article/25258211'),
('“손흥민만 언급했다” 인종차별 논란 벤탄쿠르 재차 사과',
'https://www.seoul.co.kr/news/international/europe/2024/06/22/20240622500004?wlog_tag3=naver'),
('손흥민은 용서했지만…인종차별 발언한 벤탄쿠르, FA 징계 받을 듯',
'https://www.news1.kr/articles/5454537'),
('“우린 형제”…‘인종차별’ 발언 팀 동료 사과 받아들인 대인배 손흥민',
'https://www.mk.co.kr/article/11047621'),
('토트넘 댓글창에 "No Korean"? 피해자는 손흥민인데‥',
'https://imnews.imbc.com/news/2024/world/article/6609667_36445.html'),
('손흥민 열혈팬 할머니, 손 꼭 잡고 "왜 이리 말랐어"(영상)',
'https://www.newsis.com/view/NISX20240620_0002779953'),
('‘인종차별’ 벤탄쿠르, 손흥민 용서에도 FA 징계 전망',
'https://biz.chosun.com/sports/2024/06/21/YQYF5HPIHJE5LEYHVRESAEIR5Q/?utm_source=naver&utm_medium=original&utm_campaign=biz'),
('손흥민, 前에이전트와 계약서 분쟁…2심서도 사실상 승소',
'https://www.yna.co.kr/view/AKR20240619120000004?input=1195m'),
('수지·송혜교, 친분샷 또 떴다…손흥민 모자 쓰고 미소',
'http://starin.edaily.co.kr/news/newspath.asp?newsid=01689206638924344')]
코드)
import requests
from bs4 import BeautifulSoup as bs
h = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36'}
soups=[]
n_news = 10
for i in range(0,n_news):
url_i = 'https://s.search.naver.com/p/newssearch/search.naver?cluster_rank=1&de=&ds=&eid=&field=0&force_original=&is_dts=0&is_sug_officeid=0&mynews=0&news_office_checked=&nlu_query=&nqx_theme=%7B%22theme%22%3A%7B%22main%22%3A%7B%22name%22%3A%22corporation_hq%22%7D%2C%22sub%22%3A%5B%7B%22name%22%3A%22corporation_list%22%7D%2C%7B%22name%22%3A%22stock%22%7D%5D%7D%7D&nso=%26nso%3Dso%3Ar%2Cp%3Aall%2Ca%3Aall&nx_and_query=&nx_search_hlquery=&nx_search_query=&nx_sub_query=&office_category=0&office_section_code=0&office_type=0&pd=0&photo=0&query=%ED%8F%AC%EC%8A%A4%EC%BD%94&query_original=&service_area=0&sort=0&spq=0&start='
num = str(1+i*10)
url_e = '&where=news_tab_api&nso=so:r,p:all,a:all'
url = url_i+num+url_e
page = requests.get(url,headers=h)
soup = bs(page.text, 'html.parser')
soups.append(soup)
news = []
for i in range(0,n_news):
articles = soups[i].select('li') #뉴스 리스트 가져오기 '#'은 id명 '.'은 class명
for article in articles:
try:
a1 = article.select('div > div > div')[4].select('a')[1].text
a2 = article.select('div > div > div')[4].select('a')[1]['href'].replace('\\"','')
news.append((a1,a2))
except:
continue
news