Wikipedia Scraper

When a new band catches my attention, I have always had a tendency to wonder where they came from. It can be a nice frame of reference that explains why the music came out the way it did. Did Kurt’s rainy Seattle neighborhood enable him to create such expressive music? Could Bruce have sung about his home state if it was anywhere else but New Jersey? What would the Red Hot Chili Peppers be singing about if they weren't from California? These are the questions that come to mind when I am listening.

When I first approached Localify.org, I loved that you could explore the local artists of any town in America. However, they only had about 6000 artists to city connections at the time. Their main source of information was BandsInTown, which was limited, unconfirmed and rife with spelling errors. So, the first project I started on was finding a better source of information so we could expand the database and therefore the usefulness of the site. And we all know the greatest source of information on the internet... Wikipedia!

The sidebar of RHCP's wikipedia page

As I looked through artist's wikipedia pages, I found that most of them had their origin listed in their side bar. Perfect! All I had to do was write a script that could capture this information for every artist in our database. Here is an outline of how the script works:

  • Assemble a list of all the artists in our database that have not been searched on wikipedia
  • For each artist
    • Find the corresponding wikipedia page
      • Try to access wikipedia.org/artist_name
      • Confirm that the wikipedia page is for a musician (search for keywords like "discography" or "tour")
      • If not found, try again by adding extensions (Ex: " (band)") to the wikipedia url
    • Search the wikipedia page for information about origin or birthplace
      • This was done using Python package Beautiful Soup
    • If origin/birthplace is found, find the corresponding city in our database
    • Create a relation between artist and city in our database
    Want to see the python code behind this? Check it out! The scraper begins with the run(self) function. Now that our database has more connections between artists and locations, our system can recommend more local artists to our users!

The scraper was a success, and added ≈30,000 new artist to city connections in our database. In the future, I would hope to expand Localify's global scope. Currently we only deal with cities in North America. Expanding to the whole world would enable us to include musicians from around the world, and further improve the usefulness of this web scraper (Currently bands from outside NA are skipped over).

If you haven't checked out Localify.org yet, I highly recommend you do. Feel free to check on your favorite artists, and find new artists from your local area. I hope that you enjoy exploring Localify as much as I do!

Quadio Scraper

After creating a wikipedia scraper to contribute to Localify's database, I looked for other ways I could I could improve the site. I had heard about a website called Quadio, which hosts college musicans music, to help them connect with other musicians and find an auidence. Doug Turnbull and I saw the potential all this data had for Localify; By capturing the spotify accounts and associated colleges of college musicians, Localify could have a "college scene".

Top chart of musicans from colleges all across the country. A perfect source of data for Localify!

This webscraper was more complex than my Wikipedia Scraper. Quadio is a dynamic, single-page website that requires logging in. In order to build this scraper, I had to use Selenium, a python library for headless browsing. Essentially, it allows a programmer to instruct a webdriver to interact with a website. Using Selenium, I was able to write a script that logs into Quadio, captures a list of all of the colleges in their database (contributing to our database), and scroll through the musicians on the top chart of each college. Each college musician is looked up on Spotify; if there is an associated account, Localify adds this spotify account, and creates a connection between it and the associated college. This is a very efficient way to expand our database of artists and locations, and help young musicians get discovered.

New artists are constantly coming to Quadio to share their music. With this webscraper, Localify has tapped into an endless stream of fresh artists. I hope you take the time to check out localify, and see what the college scene looks like at your local college or alma mater. Maybe you'll see someone you know, or discover a new artists!