CALIcon18 has ended
The 28th Annual CALI Conference for Law School Computing
June 7 & 8, 2018
American University Washington College of Law
Washington DC

Wednesday June 6, 2018 pre-conference activities
  • Sponsor setup at American University Washington College of Law. 1pm – 7pm
  • Conference check-in at American University Washington College of Law. 3pm – 7pm
  • Speaker Meeting (optional) at American in room NT08. 6pm - 6:30pm
Back To Schedule
Thursday, June 7 • 1:30pm - 2:30pm
Scrape it off: Using or making web scraping tools to gather structured data from webpages

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Often, in the course of research, librarians and faculty need to gather data that is presented on websites in tables or lists.   Although copy and paste can do the trick when the amount of data is small, at some point we need tools that can automate the job for us.   This is especially useful when gathering metadata for catalogs or online repositories, when much of the required metadata is already available through other sources, but needs to be collected and edited.
First, I will discuss the primary ways in which we can collect data via the internet
Then, I will focus on the use of pre-made web scraping utilities, covering what they are, how they work, and comparing some of the available free or low-cost options.
Finally, I will talk generally about how you could create your own web scraper, customized for what you need.   As an example, I will go through the development of a program I created to pull a monthly report of case opinion metadata from a court website using Python, and I will discuss the skills and tools you would need to go about developing a similar program.
Although I will be talking about programming a bit in this session, I will not be focusing on the specifics of coding.   This will be a beginner-friendly introduction to how web scrapers work, and what you may need to know if you find yourself needing to use one.

avatar for Ben Carlson

Ben Carlson

Emerging Technologies Librarian, Villanova University Charles Widger School of Law

Thursday June 7, 2018 1:30pm - 2:30pm EDT
Weinstein Courtroom C116