There are often situations in which it is useful to incorporate information from a website in scripts. For example, it may be important for a code to know what the weather is like in a certain part of the country. Python, being a language with thousands of plugins, makes this fast and easy.
Installing the web-scraping tool to python
- On whatever computer python is installed on, open up the command prompt (Win-R, type cmd in the text box)
- Type "pip install beautifulsoup4" into the command prompt.
- This will install the BeautifulSoup add-on to python. BeautifulSoup is a python add-on that can be used to easily retrieve information from a website.
Using BeautifulSoup to scrape a website
- Begin the code with the lines "import requests" and "from bs4 import BeautifulSoup"
- Like any other python plugin, the commands need to be imported
- Create a variable to store the unprocessed web data and set it equal to the function "requests.get("URL")" in which URL is the URL of the website you're scraping. This prepares the script to scrape the site.
- Create a second variable, which will be used to find certain html attributes within the website. Set it equal to BeautifulSoup(var.text,'html.parser'). var should be replaced with the name of the variable from part 2.
- Using the variable from part 3, it is now possible to create an array containing all text within certain tags and within tags with certain html classes. This can be done by setting the variable you want to call the array equal to the variable from 3's .find_all("x") function, where "x" is replaced with the tag you are looking for or, if using a specific class of a tag, replacing it with class_=nameOfClass
- Now, you will have a properly ordered array containing everything of that element type!
Comments
Post a Comment