Making a Ruby Gem

In this post, I will outline the basic set-up and process to create a gem that displays the air quality in a chosen zipcode of the US. In creating this gem, we need to consider questions like:

Where will we get the data? How will we store this data to show the user? What will the interface for the user look like? What if a zipcode is not recognized by the database? What if information is unavailable?

Let’s begin.

Initialization

As is in the proverbial example of ‘how to make a peanut butter and jelly sandwich’, let me not pass over some assumed steps that have been taken.

We will be using the Nokogiri and Pry gems and Open-Uri to scrape the information from the website that holds the data. More on this later.

The basic file structure for the gem was initialized by running bundle gem [new_gem_name], a very convenient tool available through the magic of Bundler. The structure of our gem will be as follows:

.
├── Gemfile
├── README.md
├── Rakefile
├── bin
│   ├── breathe_in
│   ├── console
│   └── setup
├── breathe_in.gemspec
├── config
│   └── environment.rb
├── lib
│   ├── breathe_in
│   │   ├── city.rb
│   │   ├── cli.rb
│   │   ├── scraper.rb
│   │   └── version.rb
│   └── breathe_in.rb
└── spec
    ├── breathe_in_spec.rb
    └── spec_helper.rb

Scraping Data

The first step is to obtain data. The gem must display: 1.) the expected air quality conditions for today as well as 2.) the current conditions, if available. To gather the data, we will utilize the resources available at AirNow.gov, which is compiled by “the U.S. Environmental Protection Agency, National Oceanic and Atmospheric Administration, National Park Service, tribal, state, and local agencies” - in other words, we can trust the data.

To begin, we can define a simple scraping method inside of our scraper.rb file. The method can be as simple as:

1
2
3
4
  def get_page
    doc = Nokogiri::HTML(open("http://airnow.gov/?action=airnow.local_city&zipcode=90101&submit=Go"))
    binding.pry
  end

Notice that we have already input a zipcode and are opening the HTML with Open-Uri’s #open method and getting ready to scrape it with Nokogiri. We will use Pry to experiment with obtaining the correct tags to extract the necessary data. Awesome how powerful that combination is, right??

Now that we have our environment set up, we need to determine what to scrape from the website:

- city name
- today's AQI high (numerical value)
- today's high index (Good, Moderate, etc.)
- current conditions (last updated)
- current AQI conditions (numerical value)
- current conditions index (Good, Moderate, etc.)

After a lot of trial and error that isn’t shown here (like how the chef pulls out a perfectly roasted chicken minutes after preparing it on the cooking channel), we skip ahead to the text values that have been obtained and stripped of unnecessary characters and converted to integers (if applicable) and finally, we end up with six methods that ultimately look like this:

1
2
3
4
5
  def self.city_name
    city = scraped_pg.css("#pageContent .ActiveCity")
    #returns array of an object with attributes including city name
    city.empty? ? nil : air_quality[:city_name] = city.text.strip
  end

What’s all the extra logic? Well, we know that sometimes, data can be unavailable for the particular zipcode or the city name doesn’t exist (due to a wrongly inputted zipcode). We will account for that by using the #empty? method and calling it on each Nokogiri result to see if it has returned an empty array instead of meaningful data.

Ultimately, we want to save this extracted data as a hash of attributes that can be assigned to a city, so we will create a class variable called .air_quality_info (saved as a class method called .air_quality). Within this hash, we will assign the attributes with the correctly extracted values for that particular zipcode.

To simplify this, we create a method .city_air_quality that will run all the other extraction methods and returns .air_quality_info, the hash of attributes for the particular zipcode, saved as a class method .air_quality.

1
2
3
4
5
6
7
8
9
  def self.city_air_quality
    city_name
    today_high
    index_level
    current_conditions_time
    current_conditions_value
    current_conditions_index 
    air_quality
  end

Also, we want to add meaning to the ‘index’ words (Good, Moderate, etc.) that we pull from the website, so we will create a few methods that prints the relevant health information. We might as well include a method that outlines detailed information about the numerical range for the AQI value.

1
2
3
4
5
6
7
8
9
10
11
12
  def self.index_good
    print "Air quality is considered satisfactory, and air pollution poses little or no risk."
  end

  etc..

  def self.AQI_range_information
    information = <<-Ruby
      The Air Quality Index (AQI) translates air quality data into an easily understandable number to identify how clean or polluted the outdoor air...
      Ruby 
    ...
  end

One more thing: we will refactor the #get_page method to take in an argument (since we will be inputting the zipcode that the user types) and pass it into the website address through string interpolation.

1
2
3
4
5
  def self.scraped_page(zipcode)
    begin
      @@scraped = Nokogiri::HTML(open("http://airnow.gov/?action=airnow.local_city&zipcode=#{zipcode}&submit=Go"))
    ...
  end 

We will further refactor this and assign this Nokogiri request as a class variable (.scraped) to make our scraping methods less cluttered.

Special Consideration

You may notice a method at the bottom of the file called .under_maintenance.

1
2
3
  def self.under_maintenance #returns true if under maintenance 
    scraped_pg.css("#pageContent .TblInvisibleFixed tr p[style*='color:#F00;']").text.include?("maintenance")
  end

The AirNow.gov website undergoes maintenance every day from 12am-4am EST and data can be sporadically available. I will talk more about this later when we get to the CLI, but know that it exists.

Making a City

Great, we have now established a class that scrapes the data. Next, we need to create a City class that will hold the scraped data.

1
2
3
4
  def initialize(city_hash={})
    city_hash.each { |key, value| self.send(("#{key}="), value) }
    @@cities << self
  end

Every new city will be initialized with a default empty hash, but if a hash is passed in, the city object will be assigned with the respective hash keys and values using the #send method. Also, every new instance of a city will be pushed into a class variable called .cities that will hold all created cities.

1
2
3
  def add_city_air_quality(air_quality_hash)
    air_quality_hash.each { |key, value| self.send(("#{key}="), value) }
  end

The other important method to note in this file is the #add_city_air_quality method that takes in a hash. For our use, this method takes in the hash of attributes that we created in the Scraper class (.air_quality_info) and assigns these attributes to the city instance. So after this method is run, each city will have a :city_name, :today_high, :today_index, :last_update_time, :last_update_value, and :last_update_index (if the values are not nil). It will also have a :zipcode attribute that gets passed in through our CLI, but more on that later.

Putting It All Together

We have scraped the necessary data and now have the ability to assign a city with this data. How do we do this? With a CLI class, of course! Open up cli.rb to see the magic.

Let’s begin with a broad overview to understand the objective of this class. What we want to do:

  1. Greet the user and ask for a zipcode
  2. Scrape the website with that zipcode
  3. Assign the scraped attributes to a city
  4. Display the information
  5. Ask the user if they want to search another zipcode, get AQI information, or exit

Let’s break it down by evaluating the control flow of the class.

1
2
3
4
5
6
7
8
  def run
    puts "*Data provided courtesy of AirNow.gov*"
    puts "How safe it is to breathe today?"
    puts ""
    get_information
    check_site_availability
    menu
  end

First, the CLI will be started with the #run method, which initially greets the user with a couple of puts statements.

Then the #get_information method will be invoked.

1
2
3
4
5
6
7
8
9
10
11
12
  def get_information
    get_zipcode
    scrape_data
    if BreatheIn::Scraper.city_name == nil
      puts "That zipcode is not recognized by Air.gov."
      get_information
    else
      new_city = BreatheIn::City.new({zipcode: self.class.zipcode})
      assign_attributes(new_city)
      display_information
    end
  end

In turn, this method invokes the #get_zipcode method, which will save the user inputted zipcode in a class variable .zipcode that will be referenced in several other methods.

1
2
3
4
5
6
7
8
9
  def get_zipcode
    input = ""
    until input.match(/\b\d{5}\b/)
      puts "Please enter a valid zipcode and wait a few seconds:"
      puts ""
      input = gets.strip
    end
    @@zipcode = input.to_s.rjust(5, '0')
  end

After getting a valid zipcode, #get_information will invoke #scrape_data, which will actually call on the #scraped_page method in the Scraper class. .zipcode will be passed into this method and through string interpolation, AirNow.gov will load the relevant data for that zipcode.

1
2
3
  def scrape_data
    BreatheIn::Scraper.scraped_page(self.class.zipcode)
  end

Next in the control flow is a conditional statement that checks if the #city_name method is nil.

1
2
3
4
5
6
  def get_information
    ...
    if BreatheIn::Scraper.city_name == nil
      puts "That zipcode is not recognized by Air.gov."
      get_information
    ...

In essence, this weeds out zipcodes that are valid zipcodes, but are not recognized by AirNow.gov.

1
2
3
4
5
6
7
8
  def get_information
    ...
    else
      new_city = BreatheIn::City.new({zipcode: self.class.zipcode})
      assign_attributes(new_city)
      display_information
    end
  end

Else, if there is indeed data available for the zipcode, a new city instance will be initialized with a hash of the zipcode value.

This newly created city will now be assigned the attributes that we scraped in the #scrape_data method, by calling on the #assign_attributes method.

1
2
3
4
  def assign_attributes(new_city)
    attributes = BreatheIn::Scraper.city_air_quality
    ...
  end

#assign_attributes takes in an argument of the new city object that was just created - more on this in a second. But first, this method will call on #city_air_quality in the Scraper class - which scrapes the relevant data from the website and creates a hash of attributes from it.

1
2
3
4
5
6
7
8
9
  def assign_attributes(new_city)
    ...
    attributes[:today_high] = "Data currently unavailable." if !attributes.has_key?(:today_high)
    attributes[:today_index] = "Data currently unavailable." if !attributes.has_key?(:today_index)
    attributes[:last_update_value] = "Data currently unavailable." if !attributes.has_key?(:last_update_value)
    attributes[:last_update_time] = "Data currently unavailable." if !attributes.has_key?(:last_update_time)
    attributes[:last_update_index] = "Data currently unavailable." if !attributes.has_key?(:last_update_index)      
    ...
  end

Data from AirNow.gov is intermittently unavailable for certain zipcodes, certain time periods, and sometimes, random system glitches. These conditional statements check to see if the data for each scraped value of the city object is nil and if it is, assigns that key a string statement. It must check every possible hash key because not every value is nil at the same time. Without this logic, a nil data value will be populated with the previous search’s results.

1
2
3
4
  def assign_attributes(new_city)
    ...
    city_info_hash = new_city.add_city_air_quality(attributes)
  end

Finally, the #add_city_air_quality method from the City class will be invoked on the city object, new_city, passing in the scraped data hash. As a result, the city will now be associated with a hash of attributes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
  def display_information
    BreatheIn::City.cities.each do |city|
      puts "---------------------"
      puts "City/Area: #{city.city_name}, Zipcode: #{city.zipcode}"
      puts "---------------------"
      puts "Today's High AQI: #{city.today_high}"
      puts "Today's Index: #{city.today_index}"
      health_description(city.today_high) if city.today_high.is_a?(Integer)
      puts "---------------------"
      puts "Last #{city.last_update_time}"
      puts "Current AQI: #{city.last_update_value}"
      puts "Current Index: #{city.last_update_index}"
      health_description(city.last_update_value) if city.last_update_value.is_a?(Integer) 
       puts "---------------------"      
    end
  end

Finally, the information will be displayed through the #display_information method. This method iterates through the .cities class variable and puts out the information.

This method also calls on #health_description, which will evaluate the value of :today_index and :last_update_index and return the relevant health message.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
  def health_description(level)
      if level.between?(0,50)
        puts "#{BreatheIn::Scraper.index_good}"
      elsif level.between?(51,100)
        puts "#{BreatheIn::Scraper.index_moderate}"
      elsif level.between?(100,150)
        puts "#{BreatheIn::Scraper.index_sensitive}"
      elsif level.between?(151,200)
        puts "#{BreatheIn::Scraper.index_unhealthy}"
      elsif level.between?(201,300)
        puts "#{BreatheIn::Scraper.index_very_unhealthy}"
      elsif level.between?(301,500)
        puts "#{BreatheIn::Scraper.index_hazardous}"   
      end
  end

Going back to the control flow, the gem has executed #get_information and returns to #run. The next method #check_site_availability is invoked. This method calls on #under_maintenance in the Scraper class to check if there is a maintenance message present on the page. If so, it will print a disclaimer notice to the user.

1
2
3
4
5
6
7
8
9
  def check_site_availability
    if BreatheIn::Scraper.under_maintenance
      disclaimer = <<-Ruby
        ***AirNow.gov undergoes maintenance from midnight to 4am EST. 
        If information is currently unavailable, please try again later.***
        Ruby
      puts disclaimer
    end
  end

The last method to be called is #menu, which will display a pretty self-explanatory list of the user’s next choices after one search.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
  def menu
    input = nil

    while input != 3
      puts ""
      puts "1. Learn more about the AQI values and the ranges."
      puts "2. Choose another zipcode."
      puts "3. Exit."
      puts "Please make a selection:"
      puts ""
      
      input = gets.strip.to_i
      case input
        when 1
          BreatheIn::Scraper.AQI_range_information
        when 2
          BreatheIn::City.reset
          get_information
          check_site_availability
        when 3
          puts "Breathe safely!"
        else
          puts "Please choose 1, 2, or 3."
      end
    end
  end

One note: when the user decides to perform another search, the .cities array class variable will be cleared, as otherwise, the previous results will be listed as well.

There you have it! Use this gem every morning to see how toxic that outside air really is. Maybe you’ll find out that on some days, it isn’t better for your health to get off that computer and run around outside for a bit.

View the GitHub repository for this gem.