Purpose

To build and test a small-scale web scraper in python, using requests, Beautiful Soup to scrape html websites, as well as pandas and numpy to build and edit the created dataframe. This was particularly helpful to get familiar with reading html websites and finding the correct way to grab the data elements to report on.

import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import datetime

Web Scraper

url = "https://fetchwi.org/adopt"
results = requests.get(url)
soup = BeautifulSoup(results.text, "html.parser")

names = []
tags = []
pagelinks = []

name_div = soup.find_all('div', class_="summary-content sqs-gallery-meta-container")

for container in name_div:
    name = container.find('a', class_="summary-title-link").text
    names.append(name)
    
    href = container.find('a', class_="summary-title-link")['href']
    pagelinks.append(href)
    
    div1 = container('div', class_="summary-metadata-container summary-metadata-container--below-content")
    
    for span in div1:
        tag = span('div', class_="summary-metadata summary-metadata--primary")
        
        for tag1 in tag:
            tag2 = tag1.text
            tags.append(tag2)

## Additional date column to potentially track over-time changes
date = datetime.datetime.now()

dates = [date] * len(tags)
    names: 54
    tags: 54
    href: 54

Length Checking

print("names:",len(names))
print("tags:",len(tags))
print("href:",len(pagelinks))

Creating Doggo Datatable

doggos = pd.DataFrame({
    'name': names,
    'tags':tags,
    'link':pagelinks,
    'date': dates
    
})

doggos = doggos.replace(to_replace=r"\n", value="", regex=True)
doggos["link"] = doggos["link"].replace(to_replace=r"^/", value="fetch.org/", regex=True)
doggos = doggos.astype({
    'name': "string",
    'tags': "string",
    'link': "string",
    'date': "object"
})

doggos
name tags link date
0 Scooby Crate trained, Housebroken, Good in the car, C... fetch.org/doggos/scooby 2022-01-23 16:10:22.349954
1 Clasby Good with dogs, Crate trained, Housebroken, Go... fetch.org/doggos/clasby 2022-01-23 16:10:22.349954
2 Blue Good with dogs, Crate trained, Housebroken, Go... fetch.org/doggos/blue2 2022-01-23 16:10:22.349954
3 Myah Good with dogs, Crate trained, Good in the car... fetch.org/doggos/myah 2022-01-23 16:10:22.349954
4 Betty Good with dogs, Crate trained, Housebroken, Go... fetch.org/doggos/betty 2022-01-23 16:10:22.349954
5 Lyla Good with dogs, Crate trained, Housebroken, Go... fetch.org/doggos/lyla 2022-01-23 16:10:22.349954
6 Jasper Good with dogs, Crate trained, Good in the car... fetch.org/doggos/jasper 2022-01-23 16:10:22.349954
7 Jolie Housebroken, Crate trained, Good for beginner ... fetch.org/doggos/jolie2 2022-01-23 16:10:22.349954
8 Elliot Good with dogs, Crate trained, Housebroken, Go... fetch.org/doggos/elliot2 2022-01-23 16:10:22.349954
9 Shiloh Good with dogs, Crate trained, Housebroken, Ca... fetch.org/doggos/shiloh2 2022-01-23 16:10:22.349954
10 Apollo Good with dogs, Crate trained, Housebroken, Go... fetch.org/doggos/apollo2 2022-01-23 16:10:22.349954
11 Sugar Housebroken, Good in the car, Can free roam wh... fetch.org/doggos/sugar3 2022-01-23 16:10:22.349954
12 Marley Good with dogs, Good with cats, Crate trained,... fetch.org/doggos/marley 2022-01-23 16:10:22.349954
13 Ruby Good with dogs, Crate trained, Good for beginn... fetch.org/doggos/ruby 2022-01-23 16:10:22.349954
14 Ripley Crate trained, Housebroken, Good in the car, W... fetch.org/doggos/ripley 2022-01-23 16:10:22.349954
15 Beau Good with dogs, Good in the car, Working on po... fetch.org/doggos/beau 2022-01-23 16:10:22.349954
16 Sonny Good with dogs, Good with older kids, Housebro... fetch.org/doggos/sonny 2022-01-23 16:10:22.349954
17 Pound Cake Good with dogs, Good with cats, Good in the ca... fetch.org/doggos/pound-cake 2022-01-23 16:10:22.349954
18 Cheesecake Good with dogs, Good with cats, Good with kids... fetch.org/doggos/cheesecake 2022-01-23 16:10:22.349954
19 Sox Good with dogs, Crate trained, Housebroken, Go... fetch.org/doggos/sox 2022-01-23 16:10:22.349954
20 Cajun Housebroken, Good in the car, Good running bud... fetch.org/doggos/cajun2 2022-01-23 16:10:22.349954
21 Dexter Good with dogs, Crate trained, Housebroken, Go... fetch.org/doggos/dexter 2022-01-23 16:10:22.349954
22 Herky Housebroken, Crate trained, Good in the car, W... fetch.org/doggos/herky 2022-01-23 16:10:22.349954
23 Kevin Good with dogs, Crate trained, Good for beginn... fetch.org/doggos/kevin 2022-01-23 16:10:22.349954
24 Jrue Good with dogs, Good with cats, Crate trained,... fetch.org/doggos/jrue 2022-01-23 16:10:22.349954
25 Jackie Good with dogs, Crate trained, Housebroken, Go... fetch.org/doggos/jackie 2022-01-23 16:10:22.349954
26 Pride Good with dogs, Crate trained, Housebroken, Go... fetch.org/doggos/pride 2022-01-23 16:10:22.349954
27 Jordan Good with dogs, Good in the car, Enjoys doggy ... fetch.org/doggos/jordan 2022-01-23 16:10:22.349954
28 Stitch Good with dogs, Crate trained, Housebroken, Ca... fetch.org/doggos/stitch 2022-01-23 16:10:22.349954
29 Beck Good with dogs, Crate trained, Housebroken, Go... fetch.org/doggos/beck 2022-01-23 16:10:22.349954
30 Lucky Good with dogs after slow intros, Good with ca... fetch.org/doggos/lucky 2022-01-23 16:10:22.349954
31 Birdi Good with dogs, Crate trained, Housebroken, Go... fetch.org/doggos/birdi 2022-01-23 16:10:22.349954
32 Major Housebroken, Can free roam when alone, Good in... fetch.org/doggos/major 2022-01-23 16:10:22.349954
33 Marvin Good with kids, Good with cats, Housebroken, C... fetch.org/doggos/marvin 2022-01-23 16:10:22.349954
34 Tova Good with dogs, Housebroken, Good in the car, ... fetch.org/doggos/tova 2022-01-23 16:10:22.349954
35 Kuma Crate trained, Housebroken, Walks well on leas... fetch.org/doggos/kuma 2022-01-23 16:10:22.349954
36 Zoey Needs slow intros to humans, Crate trained, Wa... fetch.org/doggos/zoey-ditc 2022-01-23 16:10:22.349954
37 Simone fetch.org/doggos/simone 2022-01-23 16:10:22.349954
38 Twinkle fetch.org/doggos/twinkle 2022-01-23 16:10:22.349954
39 Hella fetch.org/doggos/hella 2022-01-23 16:10:22.349954
40 Black Forest Cake fetch.org/doggos/black-forest-cake 2022-01-23 16:10:22.349954
41 Alice fetch.org/doggos/alice 2022-01-23 16:10:22.349954
42 Amelie fetch.org/doggos/amelie 2022-01-23 16:10:22.349954
43 Trail fetch.org/doggos/trail 2022-01-23 16:10:22.349954
44 Arby fetch.org/doggos/arby 2022-01-23 16:10:22.349954
45 Dotty fetch.org/doggos/dotty 2022-01-23 16:10:22.349954
46 Twerp fetch.org/doggos/twerp 2022-01-23 16:10:22.349954
47 Bundt Cake fetch.org/doggos/bundt-cake 2022-01-23 16:10:22.349954
48 Luciano fetch.org/doggos/luciano 2022-01-23 16:10:22.349954
49 Luna fetch.org/doggos/luna2 2022-01-23 16:10:22.349954
50 Roscoe fetch.org/doggos/roscoe2 2022-01-23 16:10:22.349954
51 Spock fetch.org/doggos/spock 2022-01-23 16:10:22.349954
52 August fetch.org/doggos/august 2022-01-23 16:10:22.349954
53 Cupcake fetch.org/doggos/cupcake 2022-01-23 16:10:22.349954

Selecting Dogs that Fit my Lifestyle

potential_doggos = doggos.loc[doggos["tags"].str.contains("Could live in an apartment")]

potential_doggos
name tags link date
8 Elliot Good with dogs, Crate trained, Housebroken, Go... fetch.org/doggos/elliot2 2022-01-23 16:10:22.349954
9 Shiloh Good with dogs, Crate trained, Housebroken, Ca... fetch.org/doggos/shiloh2 2022-01-23 16:10:22.349954
11 Sugar Housebroken, Good in the car, Can free roam wh... fetch.org/doggos/sugar3 2022-01-23 16:10:22.349954
26 Pride Good with dogs, Crate trained, Housebroken, Go... fetch.org/doggos/pride 2022-01-23 16:10:22.349954

Final Thoughts

This works fairly well! I had a bit of issues grabbing the correct tag items, but overall it works well. Potential next steps would be to find a way to set up a reoccurring email to myself with this information, or setting up a scraper that could handle the links from the filtered table above.