[go: nahoru, domu]

Jump to content

Beautiful Soup (HTML parser): Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
en-US
Lead: Add another secondary source
Line 31: Line 31:
}}
}}


'''Beautiful Soup''' is a [[Python (programming language)|Python]] package for parsing [[HTML]] and [[XML]] documents (including having malformed markup, i.e. non-closed tags, so named after [[tag soup]]). It creates a parse tree for parsed pages that can be used to extract data from HTML,<ref>{{Citation|last=Hajba|first=Gábor László|title=Using Beautiful Soup|date=2018|work=Website Scraping with Python: Using BeautifulSoup and Scrapy|pages=41–96|editor-last=Hajba|editor-first=Gábor László|publisher=Apress|language=en|doi=10.1007/978-1-4842-3925-4_3|isbn=978-1-4842-3925-4}}</ref> which is useful for [[web scraping]].<ref name="crummy.com" />
'''Beautiful Soup''' is a [[Python (programming language)|Python]] package for parsing [[HTML]] and [[XML]] documents (including having malformed markup, i.e. non-closed tags, so named after [[tag soup]]). It creates a parse tree for parsed pages that can be used to extract data from HTML,<ref>{{Citation|last=Hajba|first=Gábor László|title=Using Beautiful Soup|date=2018|work=Website Scraping with Python: Using BeautifulSoup and Scrapy|pages=41–96|editor-last=Hajba|editor-first=Gábor László|publisher=Apress|language=en|doi=10.1007/978-1-4842-3925-4_3|isbn=978-1-4842-3925-4}}</ref> which is useful for [[web scraping]].<ref name="crummy.com" /><ref>{{Cite web |last=Python |first=Real |title=Beautiful Soup: Build a Web Scraper With Python – Real Python |url=https://realpython.com/beautiful-soup-web-scraper-python/ |access-date=2023-06-01 |website=realpython.com |language=en}}</ref>


Beautiful Soup was started by Leonard Richardson, who continues to contribute to the project,<ref>{{Cite web |title=Code : Leonard Richardson |url=https://code.launchpad.net/%7Eleonardr/+branches |url-status=live |access-date=2020-09-19 |website=Launchpad |language=en-US}}</ref> and is additionally supported by Tidelift, a paid subscription to open-source maintenance.<ref>{{Cite web|last=Tidelift|title=beautifulsoup4 {{!}} pypi via the Tidelift Subscription|url=https://tidelift.com/subscription/pkg/pypi-beautifulsoup4|access-date=2020-09-19|website=tidelift.com|language=en}}</ref>
Beautiful Soup was started by Leonard Richardson, who continues to contribute to the project,<ref>{{Cite web |title=Code : Leonard Richardson |url=https://code.launchpad.net/%7Eleonardr/+branches |url-status=live |access-date=2020-09-19 |website=Launchpad |language=en-US}}</ref> and is additionally supported by Tidelift, a paid subscription to open-source maintenance.<ref>{{Cite web|last=Tidelift|title=beautifulsoup4 {{!}} pypi via the Tidelift Subscription|url=https://tidelift.com/subscription/pkg/pypi-beautifulsoup4|access-date=2020-09-19|website=tidelift.com|language=en}}</ref>

Revision as of 02:35, 1 June 2023

Beautiful Soup
Original author(s)Leonard Richardson
Initial release2004 (2004)
Stable release
4.12.3[1] Edit this on Wikidata / 17 January 2024; 5 months ago (17 January 2024)
Repository
Written inPython
PlatformPython
TypeHTML parser library, Web scraping
LicensePython Software Foundation License (Beautiful Soup 3 - an older version)
MIT License (versions 4 and up)[2]
Websitewww.crummy.com/software/BeautifulSoup/

Beautiful Soup is a Python package for parsing HTML and XML documents (including having malformed markup, i.e. non-closed tags, so named after tag soup). It creates a parse tree for parsed pages that can be used to extract data from HTML,[3] which is useful for web scraping.[2][4]

Beautiful Soup was started by Leonard Richardson, who continues to contribute to the project,[5] and is additionally supported by Tidelift, a paid subscription to open-source maintenance.[6]

Code example

#!/usr/bin/env python3
# Anchor extraction from HTML document
from bs4 import BeautifulSoup
from urllib.request import urlopen
with urlopen('https://en.wikipedia.org/wiki/Main_Page') as response:
    soup = BeautifulSoup(response, 'html.parser')
    for anchor in soup.find_all('a'):
        print(anchor.get('href', '/'))

Release

Beautiful Soup 3 was the official release line of Beautiful Soup from May 2006 to March 2012. The current release is Beautiful Soup 4.x. Beautiful Soup 4 can be installed with pip install beautifulsoup4.

In 2021, Python 2.7 support was retired and the release 4.9.3 was the last to support Python 2.7.[7]

See also

References

  1. ^ https://git.launchpad.net/beautifulsoup/tree/CHANGELOG. Retrieved 18 January 2024. {{cite web}}: Missing or empty |title= (help)
  2. ^ a b "Beautiful Soup website". Retrieved 18 April 2012. Beautiful Soup is licensed under the same terms as Python itself
  3. ^ Hajba, Gábor László (2018), Hajba, Gábor László (ed.), "Using Beautiful Soup", Website Scraping with Python: Using BeautifulSoup and Scrapy, Apress, pp. 41–96, doi:10.1007/978-1-4842-3925-4_3, ISBN 978-1-4842-3925-4
  4. ^ Python, Real. "Beautiful Soup: Build a Web Scraper With Python – Real Python". realpython.com. Retrieved 2023-06-01.
  5. ^ "Code : Leonard Richardson". Launchpad. Retrieved 2020-09-19.{{cite web}}: CS1 maint: url-status (link)
  6. ^ Tidelift. "beautifulsoup4 | pypi via the Tidelift Subscription". tidelift.com. Retrieved 2020-09-19.
  7. ^ Richardson, Leonard (7 Sep 2021). "Beautiful Soup 4.10.0". beautifulsoup. Google Groups. Retrieved 27 September 2022.{{cite web}}: CS1 maint: url-status (link)