Web Development: 2016

In this series, we've taken a look at how we can implement a system that allows us to programmatically define custom messages that display on a given administration page in the WordPress back end.

If you've followed along with the series thus far, then you know:

We've laid the groundwork for the plugin that's used throughout this series, and even developed it a bit further.
We've defined and used a custom hook that we can use to render the settings messages.
We've added support for success, warning, and error messages that can be rendered at the top of a given settings page.

As mentioned in the previous tutorial:

But if you've read any of my previous tutorials, you know that I'm not a fan of having duplicated code. Nor am I fan of having one class do many things. And, unfortunately, that's exactly that we're doing here.

And we're going to address that in this final tutorial. By the end, we'll have a complete refactored solution that uses some intermediate object-oriented principles like inheritance. We'll also have a few methods that we can use programmatically or that can be registered with the WordPress hook system.

Getting Started at the End

At this point you should know exactly what you need in your local development environment. Specifically, you should have the following:

PHP 5.6.25 and MySQL 5.6.28
Apache or Nginx
WordPress 4.6.1
Your preferred IDE or editor

I also recommend the most recent version of the source code as it will allow you to walk through all of the changes that we're going to make. If you don't have it, that's okay, but I recommend reading back over the previous tutorials before going any further.

In the Previous Tutorial

As you may recall (or have ascertained from the comment above), the previous tutorial left us with a single class that was doing too much work.

One way to know this is that if you were to describe what the class was doing, you wouldn't be able to give a single answer. Instead, you'd have to say that it was responsible for handling success messages, warning messages, error messages, and rendering all of them independently of one another.

And though you might make the case that it was "managing custom messages," you wouldn't necessarily be describing just how verbose the class was. That's what we hope to resolve in this tutorial.

In the Final Tutorial

Specifically, we're going to be looking at doing the following:

removing the old settings messenger class
adding a new, more generic settings message class
adding a settings messenger class with which to communicate
introducing methods that we can use independent of WordPress
streamlining how WordPress renders the messages

We have our work cut out for us, so let's go ahead and get started with all of the above.

Refactoring Our Work

When it comes to refactoring our work, it helps to know exactly what it is that we want to do. In our case, we recognize that we have a lot of duplicate code that could be condensed.

Furthermore, we have three different types of messages managed in exactly the same way save for how they are rendered. And in that instance, it's an issue of the HTML class attributes.

Thus, we can generalize that code to focus on a specific type, and we can consolidate a lot of the methods for adding success messages or retrieving error messages by generalizing a method to recognize said type.

Ultimately, we will do that. But first, some housekeeping.

1. Remove the Old Settings Messenger

In the previous tutorials, we've been working with a class called Settings_Messenger. Up to this point, it has served its purpose, but we're going to be refactoring this class throughout the remainder of this tutorial.

When it comes to this type of refactoring, it's easy to want to simply delete the class and start over. There are times in which this is appropriate, but this is not one of them. Instead, we're going to take that class and refactor what's already there.

All of that to say, don't delete the file and get started with a new one. Instead, track with what we're doing throughout this tutorial.

2. A New Setting Message Class

First, let's introduce a Settings_Message class. This represents any type of settings message with which we're going to write. That is, it will manage success messages, error messages, and warning messages.

To do this, we'll define the class, introduce a single property, and then we'll instantiate it in the constructor. Check out this code, and I'll explain a bit more below:

<?php

class Settings_Message {

    private $messages;

    public function __construct() {

        $this->messages = array(
            'success'   => array(),
            'error'     => array(),
            'warning'   => array(),
        );
    }
}

Notice that we've created a private attribute, $messages. When the class is instantiated, we create a multidimensional array. Each index, identified either by success, error, or warning, refers to its own array in which we'll be storing the corresponding messages.

Next, we need to be able to add a message, get a message, and get all of the messages. I'll discuss each of these in more detail momentarily.

Adding Messages

First, let's look at how we're adding messages:

<?php 
public function add_message( $type, $message ) {

    $message = sanitize_text_field( $message );

    if ( in_array( $message, $this->messages[ $type ] ) ) {
        return;
    }

    array_push( $this->messages[ $type ], $message );
}

This message first takes the incoming string and sanitizes the data. Then it checks to see if it already exists in the success messages. If so, it simply returns. After all, we don't want duplicate messages.

Otherwise, it adds the message to the collection.

Getting Messages

Retrieving messages comes in two forms:

rendering individual messages by type
rendering the messages in the display of the administration page (complete with HTML sanitization, etc.)

Remember, there are times where we may only want to display warning messages. Other times, we may want to display all of the messages. Since there are two ways of doing this, we can leverage one and then take advantage of it in other another function.

Sound confusing? Hang with me and I'll explain all of it. The first part we're going to focus on is how to render messages by type (think success, error, or warning). Here's the code for doing that (and it should look familiar):

<?php

public function get_messages( $type ) {

    if ( empty( $this->messages[ $type ] ) ) {
                return;
        }

        $html  = "<div class='notice notice-$type is-dismissible'>";
        $html .= '<ul>';
        foreach ( $this->messages[ $type ] as $message ) {
                $html .= "<li>$message</li>";
        }
        $html .= '</ul>';
        $html .= '</div><!-- .notice-$type -->';

        $allowed_html = array(
                'div' => array(
                        'class' => array(),
                ),
                'ul' => array(),
                'li' => array(),
        );

        echo wp_kses( $html, $allowed_html );
}

Notice here that we're using much of the same code from the previous tutorial; however, we've generalized it so that it looks at the incoming $type and dynamically applies it to the markup.

This allows us to have a single function for rendering our messages. This isn't all, though. What about the times we want to get all messages? This could be to render on a page or to grab them programmatically for some other processing.

To do this, we can introduce another function:

<?php

public function get_all_messages() {

    foreach ( $this->messages as $type => $message ) {
        $this->get_messages( $type );
    }
}

This message should be easy enough to understand. It simply loops through all of the messages we have in our collection and calls the get_messages function we outlined above.

It still renders them all together (which we'll see one use of them in our implementation of a custom hook momentarily). If you wanted to use them for another purpose, you could append the result into a string and return it to the caller, or perform some other programmatic function.

This is but one implementation.

3. The Settings Messenger

That does it for the actual Settings_Message class. But how do we communicate with it? Sure, we can talk to it directly, but if there's an intermediate class, we have some control over what's returned to us without adding more responsibility to the Settings_Message class, right?

Enter the Settings_Messenger. This class is responsible for allows us to read and write settings messages. I think a case could be made that you could split this up into two classes by its responsibility because it both reads and writes but, like a messenger who sends and receives, that's the purpose of this class.

The initial setup of the class is straightforward.

The constructor creates an instance of the Settings_Message class that we can use to send and receive messages.
It associates a method with our custom tutsplus_settings_messages hook we defined in a previous tutorial.

Take a look at the first couple of methods:

<?php

class Settings_Messenger {

    private $message;

    public function __construct() {
        $this->message = new Settings_Message();
    }

    public function init() {
        add_action( 'tutsplus_settings_messages', array( $this, 'get_all_messages' ) );
    }
}

Remember from earlier in this tutorial, we have the hook defined in our view which can be found in settings.php. For the sake of completeness, it's listed here:

<div class="wrap">

    <h1><?php echo esc_html( get_admin_page_title() ); ?></h1>
        <?php do_action( 'tutsplus_settings_messages' ); ?>

        <p class="description">
                We aren't actually going to display options on this page. Instead, we're going
                to use this page to demonstration how to hook into our custom messenger.
        </p><!-- .description -->
</div><!-- .wrap -->

Notice, however, that this particular hook takes advantage of the get_all_messages method we'll review in a moment. It doesn't have to use this method. Instead, it could be used to simply render success messages or any other methods that you want to use.

Adding Messages

Creating the functions to add messages is simple as these functions require a type and the message itself. Remember, the Settings_Message takes care of sanitizing the information so we can simply pass in the incoming messages.

See below where we're adding success, warning, and error messages:

<?php

public function add_success_message( $message ) {
    $this->add_message( 'success', $message );
}

public function add_warning_message( $message ) {
    $this->add_message( 'warning', $message );
}

public function add_error_message( $message ) {
    $this->add_message( 'error', $message );
}

It's easy, isn't it?

Getting Messages

Retrieving messages isn't much different except we just need to provide the type of messages we want to retrieve:

<?php

public function get_success_messages() {
    echo $this->get_messages( 'success' );
}

public function get_warning_messages() {
    echo $this->get_messages( 'warning' );
}

public function get_error_messages() {
    echo $this->get_messages( 'error' );
}

Done and done, right?

But Did You Catch That?

Notice that the messages above all refer to two other methods we haven't actually covered yet. These are private messages that help us simplify the calls above.

Check out the following private methods both responsible for adding and retrieving messages straight from the Settings_Message instance maintained on the messenger object:

<?php

private function add_message( $type, $message ) {
    $this->message->add_message( $type, $message );
}

private function get_messages( $type ) {
    return $this->message->get_messages( $type );
}

And that wraps up the new Settings_Messenger class. All of this is much simpler, isn't it?

Starting the Plugin

It does raise the question, though: How do we start the plugin now that we've had all of these changes?

See the entire function below:

<?php

add_action( 'plugins_loaded', 'tutsplus_custom_messaging_start' );
/**
 * Starts the plugin.
 *
 * @since 1.0.0
 */
function tutsplus_custom_messaging_start() {

    $plugin = new Submenu(
        new Submenu_Page()
    );
    $plugin->init();

    $messenger = new Settings_Messenger();
    $messenger->init();

    $messenger->add_success_message( 'Nice shot kid, that was one in a million!' );
    $messenger->add_warning_message( 'Do not go gently into that good night.' );
    $messenger->add_error_message( 'Danger Will Robinson.' );
}

And that's it.

A few points to note:

If you don't call init on the Settings_Messenger, then you don't have to worry about displaying any messages in on your settings page.
The code adds messages to the Settings_Messenger, but it doesn't actually retrieve any because I am using the init method.
If you want to retrieve the messages then you can use the methods we've outlined above.

That's all for the refactoring. This won't work exactly out of the box as there is still some code needed to load all of the PHP files required to get the plugin working; however, the code above focuses on the refactoring which is the point of this entire tutorial.

Conclusion

For a full working version of this tutorial and complete source code that does work out of the box, please download the source code attached to this post on the right sidebar.

I hope that over the course of this material you picked up a number of new skills and ways to approach WordPress development. When looking over the series, we've covered a lot:

custom menus
introducing administration pages
the various message types
defining and leveraging custom hooks
and refactoring object-oriented code

As usual, I'm also always happy to answer questions via the comments, and you can also check out my blog and follow me on Twitter. I usually talk all about software development within WordPress and tangential topics, as well. If you're interested in more WordPress development, don't forget to check out my previous series and tutorials, and the other WordPress material we have here on Envato Tuts+.

Resources

How to Create Toolkit Presets in Adobe Photoshop Lightroom

Wednesday, December 28, 2016

Master the Documentary Interview With These Practical Exercises

In-App Purchases in iOS With Swift 3

How to Create a Typography Dispersion Action in Adobe Photoshop

Building Your First Web Scraper, Part 3

14 Killer Gmail Features to Make Use of Now

Taking CSS Shapes to the Next Level

Unity 2D Joints: Slider, Relative, Spring, and Friction Joints

How to Draw Transport: How to Draw a Military Tank

Tuesday, December 27, 2016

How to Create a Winter Rural Photo Manipulation Scene With Adobe Photoshop

The Value of Your Time: How Much Is an Hour Worth to You?

The Best PowerPoint Templates of 2016 (PPT Presentation Designs)

How to Create a Unicorn Illustration in Adobe Illustrator

What's New in Swift 3?

Programming With Yii2: Helpers

3 Methods for Automatic Browser Reloading

How to Embrace the Creative Limitations of Smartphone Photography

Monday, December 26, 2016

30 Best Photoshop Collage Templates

How to Use PowerPoint Slide Master View in 60 Seconds

Uploading Files With Rails and Shrine

There are many file uploading gems out there like CarrierWave, Paperclip, and Dragonfly, to name a few. They all have their specifics, and probably you've already used at least one of these gems.

Today, however, I want to introduce a relatively new, but very cool solution called Shrine, created by Janko Marohnić. In contrast to some other similar gems, it has a modular approach, meaning that every feature is packed as a module (or plugin in Shrine's terminology). Want to support validations? Add a plugin. Wish to do some file processing? Add a plugin! I really love this approach as it allows you to easily control which features will be available for which model.

In this article I am going to show you how to:

integrate Shrine into a Rails application
configure it (globally and per-model)
add the ability to upload files
process files
add validation rules
store additional metadata and employ file cloud storage with Amazon S3

The source code for this article is available on GitHub.

The working demo can be found here.

Integrating Shrine

To start off, create a new Rails application without the default testing suite:

rails new FileGuru -T

I will be using Rails 5 for this demo, but most of the concepts apply to versions 3 and 4 as well.

Drop the Shrine gem into your Gemfile:

gem "shrine"

Then run:

bundle install

Now we will require a model that I am going to call Photo. Shrine stores all file-related information in a special text column ending with a _data suffix. Create and apply the corresponding migration:

rails g model Photo title:string image_data:text
rails db:migrate

Note that for older versions of Rails, the latter command should be:

rake db:migrate

Configuration options for Shrine can be set both globally and per-model. Global settings are done, of course, inside the initializer file. There I am going to hook up the necessary files and plugins. Plugins are used in Shrine to extract pieces of functionality into separate modules, giving you full control of all the available features. For example, there are plugins for validation, image processing, caching attachments, and more.

For now, let's add two plugins: one to support ActiveRecord and another one to set up logging. They are going to be included globally. Also, set up file system storage:

config/initializers/shrine.rb

require "shrine"
require "shrine/storage/file_system"

Shrine.plugin :activerecord
Shrine.plugin :logging, logger: Rails.logger

Shrine.storages = {
  cache: Shrine::Storage::FileSystem.new("public", prefix: "uploads/cache"),
  store: Shrine::Storage::FileSystem.new("public", prefix: "uploads/store"),
}

Logger will simply output some debugging information inside the console for you saying how much time was spent to process a file. This can come in handy.

2015-10-09T20:06:06.676Z #25602: STORE[cache] ImageUploader[:avatar] User[29543] 1 file (0.1s)
2015-10-09T20:06:06.854Z #25602: PROCESS[store]: ImageUploader[:avatar] User[29543] 1-3 files (0.22s)
2015-10-09T20:06:07.133Z #25602: DELETE[destroyed]: ImageUploader[:avatar] User[29543] 3 files (0.07s)

All uploaded files will be stored inside the public/uploads directory. I don't want to track these files in Git, so exclude this folder:

.gitignore

public/uploads

Now create a special "uploader" class that is going to host model-specific settings. For now, this class is going to be empty:

models/image_uploader.rb

class ImageUploader < Shrine
end

Lastly, include this class inside the Photo model:

models/photo.rb

include ImageUploader[:image]

[:image] adds a virtual attribute that will be used when constructing a form. The above line can be rewritten as:

  include ImageUploader.attachment(:image)  
  # or
  include ImageUploader::Attachment.new(:image)

Nice! Now the model is equipped with Shrine's functionality, and we can proceed to the next step.

Controller, Views, and Routes

For the purposes of this demo, we'll need only one controller to manage photos. The index page will serve as the root:

pages_controller.rb

class PhotosController < ApplicationController
  def index
    @photos = Photo.all
  end
end

The view:

views/photos/index.html.erb

<h1>Photos</h1>

<%= link_to 'Add Photo', new_photo_path %>

<%= render @photos %>

In order to render the @photos array, a partial is required:

views/photos/_photo.html.erb

<div>
  <% if photo.image_data? %>
    <%= image_tag photo.image_url %>
  <% end %>
  <p><%= photo.title %> | <%= link_to 'Edit', edit_photo_path(photo) %></p>
</div>

image_data? is a method presented by Shrine that checks whether a record has an image.

image_url is yet another Shrine method that simply returns a path to the original image. Of course, it is much better to display a small thumbnail instead, but we will take care of that later.

Add all the necessary routes:

config/routes.rb

  resources :photos, only: [:new, :create, :index, :edit, :update]

  root 'photos#index'

This is it—the groundwork is done, and we can proceed to the interesting part!

Uploading Files

In this section I will show you how to add the functionality to actually upload files. The controller actions are very simple:

photos_controller.rb

def new
    @photo = Photo.new
end

def create
    @photo = Photo.new(photo_params)
    if @photo.save
        flash[:success] = 'Photo added!'
        redirect_to photos_path
    else
        render 'new'
    end
end

The only gotcha is that for strong parameters you have to permit the image virtual attribute, not the image_data.

photos_controller.rb

private

def photo_params
    params.require(:photo).permit(:title, :image)
end

Create the new view:

views/photos/new.html.erb

<h1>Add photo</h1>

<%= render 'form' %>

The form's partial is also trivial:

views/photos/_form.html.erb

<%= form_for @photo do |f| %>
  <%= render "shared/errors", object: @photo %>

  <%= f.label :title %>
  <%= f.text_field :title %>

  <%= f.label :image %>
  <%= f.file_field :image %>

  <%= f.submit %>
<% end %>

Once again, note that we are using the image attribute, not the image_data.

Lastly, add another partial to display errors:

views/shared/_errors.html.erb

<% if object.errors.any? %>
  <h3>The following errors were found:</h3>

  <ul>
    <% object.errors.full_messages.each do |message| %>
      <li><%= message %></li>
    <% end %>
  </ul>
<% end %>

This is pretty much all—you can start uploading images right now.

Validations

Of course, much more work has to be done in order to complete the demo app. The main problem is that the users may upload absolutely any type of file with any size, which is not particularly great. Therefore, add another plugin to support validations:

config/inititalizers/shrine.rb

Shrine.plugin :validation_helpers

Set up the validation logic for the ImageUploader:

models/image_uploader.rb

Attacher.validate do
    validate_max_size 1.megabyte, message: "is too large (max is 1 MB)"
    validate_mime_type_inclusion ['image/jpg', 'image/jpeg', 'image/png']
end

I am permitting only JPG and PNG images less than 1MB to be uploaded. Tweak these rules as you see fit.

MIME Types

Another important thing to note is that, by default, Shrine will determine a file's MIME type using the Content-Type HTTP header. This header is passed by the browser and set only based on the file's extension, which is not always desirable.

If you wish to determine the MIME type based on the file's contents, then use a plugin called determine_mime_type. I will include it inside the uploader class, as other models may not require this functionality:

models/image_uploader.rb

plugin :determine_mime_type

This plugin is going to use Linux's file utility by default.

Caching Attached Images

Currently, when a user sends a form with incorrect data, the form will be displayed again with errors rendered above. The problem, however, is that the attached image will be lost, and the user will need to select it once again. This is very easy to fix using yet another plugin called cached_attachment_data:

models/image_uploader.rb

plugin :cached_attachment_data

Now simply add a hidden field into your form.

views/photos/_form.html.erb

<%= f.hidden_field :image, value: @photo.cached_image_data %>
<%= f.label :image %>
<%= f.file_field :image %>

Editing a Photo

Now images can be uploaded, but there is no way to edit them, so let's fix it right away. The corresponding controller's actions are somewhat trivial:

photos_controller.rb

def edit
    @photo = Photo.find(params[:id])
end

def update
    @photo = Photo.find(params[:id])
    if @photo.update_attributes(photo_params)
      flash[:success] = 'Photo edited!'
      redirect_to photos_path
    else
      render 'edit'
    end
end

The same _form partial will be utilized:

views/photos/edit.html.erb

<h1>Edit Photo</h1>

<%= render 'form' %>

Nice, but not enough: users still can't remove an uploaded image. In order to allow this, we'll need—guess what—another plugin:

models/image_uploader.rb

plugin :remove_attachment

It uses a virtual attribute called :remove_image, so permit it inside the controller:

photos_controller.rb

def photo_params
    params.require(:photo).permit(:title, :image, :remove_image)
end

Now just display a checkbox to remove an image if a record has an attachment in place:

views/photos/_form.html.erb

<% if @photo.image_data? %>
    Remove attachment: <%= f.check_box :remove_image %>
<% end %>

Generating a Thumbnail Image

Currently we display original images, which is not the best approach for previews: photos may be large and occupy too much space. Of course, you could simply employ the CSS width and height attributes, but that's a bad idea as well. You see, even if the image is set to be small using styles, the user will still need to download the original file, which might be pretty big.

Therefore, it is much better to generate a small preview image on the server side during the initial upload. This involves two plugins and two additional gems. Firstly, drop in the gems:

gem "image_processing"
gem "mini_magick", ">= 4.3.5"

Image_processing is a special gem created by the author of Shrine. It presents some high-level helper methods to manipulate images. This gem, in turn, relies on mini_magick, a Ruby wrapper for ImageMagick. As you've guessed, you'll need ImageMagick on your system in order to run this demo.

Install these new gems:

bundle install

Now include the plugins along with their dependencies:

models/image_uploader.rb

require "image_processing/mini_magick"

class ImageUploader < Shrine
    include ImageProcessing::MiniMagick
    plugin :processing
    plugin :versions
    # other code...
end

Processing is the plugin to actually manipulate an image (for example, shrink it, rotate, convert to another format, etc.). Versions, in turn, allows us to have an image in different variants. For this demo, two versions will be stored: "original" and "thumb" (resized to 300x300).

Here is the code to process an image and store its two versions:

models/image_uploader.rb

class ImageUploader < Shrine
    process(:store) do |io, context|
        { original: io, thumb: resize_to_limit!(io.download, 300, 300) }
    end
end

resize_to_limit! is a method provided by the image_processing gem. It simply shrinks an image down to 300x300 if it is larger and does nothing if it's smaller. Moreover, it keeps the original aspect ratio.

Now when displaying the image, you just need to provide either the :original or :thumb argument to the image_url method:

views/photos/_photo.html.erb

<div>
  <% if photo.image_data? %>
    <%= image_tag photo.image_url(:thumb) %>
  <% end %>
  <p><%= photo.title %> | <%= link_to 'Edit', edit_photo_path(photo) %></p>
</div>

The same can be done inside the form:

views/photos/_form.html.erb

<% if @photo.image_data? %>
    <%= image_tag @photo.image_url(:thumb) %>
    Remove attachment: <%= f.check_box :remove_image %>
<% end %>

To automatically delete the processed files after uploading is complete, you may add a plugin called delete_raw:

models/image_uploader.rb

plugin :delete_raw

Image's Metadata

Apart from actually rendering an image, you may also fetch its metadata. Let's, for example, display the original photo's size and MIME type:

views/photos/_photo.html.erb

<div>
  <% if photo.image_data? %>
    <%= image_tag photo.image_url(:thumb) %>
    <p>
      Size <%= photo.image[:original].size %> bytes<br>
      MIME type <%= photo.image[:original].mime_type %><br>
    </p>
  <% end %>
  <p><%= photo.title %> | <%= link_to 'Edit', edit_photo_path(photo) %></p>
</div>

What about its dimensions? Unfortunately, they are not stored by default, but this is possible with a plugin called store_dimensions.

Image's Dimensions

The store_dimensions plugin relies on the fastimage gem, so hook it up now:

gem 'fastimage'

Don't forget to run:

bundle install

Now just include the plugin:

models/image_uploader.rb

plugin :store_dimensions

And display the dimensions using the width and height methods:

views/photos/_photo.html.erb

<div>
  <% if photo.image_data? %>
    <%= image_tag photo.image_url(:thumb) %>
    <p>
      Size <%= photo.image[:original].size %> bytes<br>
      MIME type <%= photo.image[:original].mime_type %><br>
      Dimensions <%= "#{photo.image[:original].width}x#{photo.image[:original].height}" %>
    </p>
  <% end %>
  <p><%= photo.title %> | <%= link_to 'Edit', edit_photo_path(photo) %></p>
</div>

Also, there is a dimensions method available that returns an array containing width and height (for example, [500, 750]).

Moving to the Cloud

Developers often choose cloud services to host uploaded files, and Shrine does present such a possibility. In this section, I will show you how to upload files to Amazon S3.

As the first step, include two more gems into the Gemfile:

gem "aws-sdk", "~> 2.1"
group :development do
    gem 'dotenv-rails'
end

aws-sdk is required to work with S3's SDK, whereas dotenv-rails will be used to manage environment variables in development.

bundle install

Before proceeding, you should obtain a key pair to access S3 via API. To get it, sign in (or sign up) to Amazon Web Services Console and navigate to Security Credentials > Users. Create a user with permissions to manipulate files on S3. Here is the simple policy presenting full access to S3:

{
  "Version": "2016-11-14",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "s3:*",
      "Resource": "*"
    }
  ]
}

Download the created user's key pair. Alternatively, you might use root access keys, but I strongly discourage you from doing that as it's very insecure.

Next, create an S3 bucket to host your files and add a file into the project's root to host your configuration:

.env

S3_KEY=YOUR_KEY
S3_SECRET=YOUR_SECRET
S3_BUCKET=YOUR_BUCKET
S3_REGION=YOUR_REGION

Never ever expose this file to the public, and make sure you exclude it from Git:

.gitignore

.env

Now modify Shrine's global configuration and introduce a new storage:

config/initializers/shrine.rb

require "shrine"
require "shrine/storage/s3"

s3_options = {
  access_key_id:     ENV['S3_KEY'],
  secret_access_key: ENV['S3_SECRET'],
  region:            ENV['S3_REGION'],
  bucket:            ENV['S3_BUCKET'],
}

Shrine.storages = {
  cache: Shrine::Storage::S3.new(prefix: "cache", **s3_options),
  store: Shrine::Storage::S3.new(prefix: "store", **s3_options),
}

That's it! No changes have to be made to the other parts of the app, and you can test this new storage right away. If you are receiving errors from S3 related to incorrect keys, make sure you accurately copied the key and secret, without any trailing spaces and invisible special symbols.

Conclusion

We've come to the end of this article. Hopefully, by now you feel much confident in using Shrine and are eager to employ it in one of your projects. We have discussed many of this gem's features, but there are even more, like the ability to store additional context along with files and the direct upload mechanism.

Therefore, do browse Shrine's documentation and its official website, which thoroughly describes all available plugins. If you have other questions left about this gem, don't hesitate to post them. I thank you for staying with me, and I'll see you soon!

How To Use a Gimbal: Balancing Your Camera

How to Create a Watercolor Inspired Text Effect in Adobe Photoshop

Sunday, December 25, 2016

Exploring Alfred's Latest Features

Saturday, December 24, 2016

Making Sounds With Ragnarök

Friday, December 23, 2016

101 Free Templates for Adobe After Effects (and How to Make Your Own)

International Artist Feature: Turkey

Envato Market in 60 Seconds: Becoming an Affiliate

How to Create a Double Exposure Action in Adobe Photoshop

How to Submit Your App to the Amazon Appstore

Easy Location-Based iOS Apps With the appyMap Template

Thursday, December 22, 2016

Learn CSS Clipping and Masking in Your Next Coffee Break

How to Create a 3D Ornament Inspired Text Effect in Adobe Photoshop

How to File News and Feature Photo Assignments the Right Way

How to Create a Vintage Camera in Adobe Illustrator

Creating a Custom WordPress Messaging System, Part 3

Wednesday, December 21, 2016

New Course: Build a REST API With Laravel

Create a Custom Alert Controller in iOS 10

How to Create a New Year's Celebration Icon Pack in Adobe Illustrator

Building Your First Web Scraper, Part 2

How to Find Your Google Gmail Contacts and Organize Them Better

Unity 2D Joints: Distance, Hinge, Target, and Fixed Joints

50 Awesome Photo Effect Tutorials

That Was the Year That Was: 2016 in Web Design

Tuesday, December 20, 2016

Google Play Games Services: Creating Events and Quests

How to Create a Christmas-Themed Icon Pack in Adobe Illustrator

15+ Best Agency WordPress Themes: For Creative Site Designs

Quick Tip: How to Get Free Logos for Your Envato Market Previews

How to Concatenate in Excel to Combine Text Strings

Building Your Startup: Security Basics

How to Draw Simple Christmas Icons—With Videos!

How to Create a New Year Calendar From Your Photos

Monday, December 19, 2016

How to Create a Social Media Icon Pack in Adobe Illustrator

New Course: How to Draw Animals in Perspective

Compressing and Extracting Files in Python

If you have been using computers for some time, you have probably come across files with the .zip extension. They are special files that can hold the compressed content of many other files, folders, and subfolders. This makes them pretty useful for transferring files over the internet. Did you know that you can use Python to compress or extract files?

This tutorial will teach you how to use the zipfile module in Python, to extract or compress individual or multiple files at once.

Compressing Individual Files

This one is easy and requires very little code. We begin by importing the zipfile module and then open the ZipFile object in write mode by specifying the second parameter as 'w'. The first parameter is the path to the file itself. Here is the code that you need:

import zipfile
        
jungle_zip = zipfile.ZipFile('C:\\Stories\\Fantasy\\jungle.zip', 'w')
jungle_zip.write('C:\\Stories\\Fantasy\\jungle.pdf', compress_type=zipfile.ZIP_DEFLATED)

jungle_zip.close()

Please note that I will specify the path in all the code snippets in a Windows style format; you will need to make appropriate changes if you are on Linux or Mac.

You can specify different compression methods to compress files. The newer methods BZIP2 and LZMA were added in Python version 3.3, and there are some other tools as well which don't support these two compression methods. For this reason, it is safe to just use the DEFLATED method. You should still try out these methods to see the difference in the size of the compressed file.

Compressing Multiple Files

This is slightly complex as you need to iterate over all files. The code below should compress all files with the extension pdf in a given folder:

import os
import zipfile

fantasy_zip = zipfile.ZipFile('C:\\Stories\\Fantasy\\archive.zip', 'w')

for folder, subfolders, files in os.walk('C:\\Stories\\Fantasy'):

    for file in files:
        if file.endswith('.pdf'):
            fantasy_zip.write(os.path.join(folder, file), os.path.relpath(os.path.join(folder,file), 'C:\\Stories\\Fantasy'), compress_type = zipfile.ZIP_DEFLATED)

fantasy_zip.close()

This time, we have imported the os module and used its walk() method to go over all files and subfolders inside our original folder. I am only compressing the pdf files in the directory. You can also create different archived files for each format using if statements.

If you don't want to preserve the directory structure, you can put all the files together by using the following line:

fantasy_zip.write(os.path.join(folder, file), file, compress_type = zipfile.ZIP_DEFLATED)

The write() method accepts three parameters. The first parameter is the name of our file that we want to compress. The second parameter is optional and allows you to specify a different file name for the compressed file. If nothing is specified, the original name is used.

Extracting All Files

You can use the extractall() method to extract all the files and folders from a zip file into the current working directory. You can also pass a folder name to extractall() to extract all files and folders in a specific directory. If the folder that you passed does not exist, this method will create one for you. Here is the code that you can use to extract files:

import zipfile
        
fantasy_zip = zipfile.ZipFile('C:\\Stories\\Fantasy\\archive.zip')
fantasy_zip.extractall('C:\\Library\\Stories\\Fantasy')

fantasy_zip.close()

If you want to extract multiple files, you will have to supply the name of files that you want to extract as a list.

Extracting Individual Files

This is similar to extracting multiple files. One difference is that this time you need to supply the filename first and the path to extract them to later. Also, you need to use the extract() method instead of extractall(). Here is a basic code snippet to extract individual files.

import zipfile
        
fantasy_zip = zipfile.ZipFile('C:\\Stories\\Fantasy\\archive.zip')
fantasy_zip.extract('Fantasy Jungle.pdf', 'C:\\Stories\\Fantasy')

fantasy_zip.close()

Reading Zip Files

Consider a scenario where you need to see if a zip archive contains a specific file. Up to this point, your only option to do so is by extracting all the files in the archive. Similarly, you may need to extract only those files which are larger than a specific size. The zipfile module allows us to inquire about the contents of an archive without ever extracting it.

Using the namelist() method of the ZipFile object will return a list of all members of an archive by name. To get information on a specific file in the archive, you can use the getinfo() method of the ZipFile object. This will give you access to information specific to that file, like the compressed and uncompressed size of the file or its last modification time. We will come back to that later.

Calling the getinfo() method one by one on all files can be a tiresome process when there are a lot of files that need to be processed. In this case, you can use the infolist() method to return a list containing a ZipInfo object for every single member in the archive. The order of these objects in the list is same as that of actual zipfiles.

You can also directly read the contents of a specific file from the archive using the read(file) method, where file is the name of the file that you intend to read. To do this, the archive must be opened in read or append mode.

To get the compressed size of an individual file from the archive, you can use the compress_size attribute. Similarly, to know the uncompressed size, you can use the file_size attribute.

The following code uses the properties and methods we just discussed to extract only those files that have a size below 1MB.

import zipfile

stories_zip = zipfile.ZipFile('C:\\Stories\\Funny\\archive.zip')

for file in stories_zip.namelist():
    if stories_zip.getinfo(file).file_size < 1024*1024:
                stories_zip.extract(file, 'C:\\Stories\\Short\\Funny')
        
stories_zip.close()

To know the time and date when a specific file from the archive was last modified, you can use the date_time attribute. This will return a tuple of six values. The values will be the year, month, day of the month, hours, minutes, and seconds, in that specific order. The year will always be greater than or equal to 1980, and hours, minutes, and seconds are zero-based.

import zipfile

stories_zip = zipfile.ZipFile('C:\\Stories\\Funny\\archive.zip')

thirsty_crow_info = stories_zip.getinfo('The Thirsty Crow.pdf')

print(thirsty_crow_info.date_time)
print(thirsty_crow_info.compress_size)
print(thirsty_crow_info.file_size)
        
stories_zip.close()

This information about the original file size and compressed file size can help you decide whether it is worth compressing a file. I am sure it can be used in some other situations as well.

Final Thoughts

As evident from this tutorial, using the zipfile module to compress files gives you a lot of flexibility. You can compress different files in a directory to different archives based on their type, name, or size. You also get to decide whether you want to preserve the directory structure or not. Similarly, while extracting the files, you can extract them to the location you want, based on your own criteria like size, etc.

To be honest, it was also pretty exciting for me to compress and extract files by writing my own code. I hope you enjoyed the tutorial, and if you have any questions, please let me know in the comments.

How to Make a Media Kit for Your Small Business

Coding Functional Android Apps in Kotlin: Getting Started

Development for Designers: Understanding the Front-End

The Power of PowerShell, Part 2

In part one, I showed you some cool stuff you can do with PowerShell, covered the history of PowerShell, and explored in depth the capabilities of PowerShell as a strong scripting language that supports procedural, functional, and object-oriented programming.

In part two, I'll discuss the interactive shell, the profile, and the prompt, and I'll compare PowerShell to Bash.

PowerShell: The Interactive Shell

PowerShell was designed from the get-go as an interactive shell for Windows sys admins and power users. It focuses on a small number of concepts, very consistent experience, and an object pipeline to chain and combine commands, filter them and format them. Its strong help system, which also adheres to a consistent format, is a pleasure to use.

Let's see some of that in action.

Getting Help

The comprehensive help system is accessible through Get-Help.

PS C:\WINDOWS\system32> Help Invoke-WebRequest

NAME
    Invoke-WebRequest
    
SYNOPSIS
    Gets content from a web page on the Internet.
    
    
SYNTAX
    Invoke-WebRequest [-Uri] <Uri> [-Body <Object>] [-Certificate <X509Certificate>] [-CertificateThumbprint <String>] [-ContentType <String>] [-Credential <PSCredential>] [-DisableKeepAlive] [-Headers 
    <IDictionary>] [-InFile <String>] [-MaximumRedirection <Int32>] [-Method {Default | Get | Head | Post | Put | Delete | Trace | Options | Merge | Patch}] [-OutFile <String>] [-PassThru] [-Proxy <Uri>] 
    [-ProxyCredential <PSCredential>] [-ProxyUseDefaultCredentials] [-SessionVariable <String>] [-TimeoutSec <Int32>] [-TransferEncoding {chunked | compress | deflate | gzip | identity}] 
    [-UseBasicParsing] [-UseDefaultCredentials] [-UserAgent <String>] [-WebSession <WebRequestSession>] [<CommonParameters>]
    
    
DESCRIPTION
    The Invoke-WebRequest cmdlet sends HTTP, HTTPS, FTP, and FILE requests to a web page or web service. It parses the response and returns collections of forms, links, images, and other significant HTML 
    elements.
    
    This cmdlet was introduced in Windows PowerShell 3.0.
    

RELATED LINKS
    Online Version: http://ift.tt/2gRZ1P0
    Invoke-RestMethod 
    ConvertFrom-Json 
    ConvertTo-Json 

REMARKS
    To see the examples, type: "get-help Invoke-WebRequest -examples".
    For more information, type: "get-help Invoke-WebRequest -detailed".
    For technical information, type: "get-help Invoke-WebRequest -full".
    For online help, type: "get-help Invoke-WebRequest -online"

To get more detailed help and see examples, use the proper switches: -examples, -details, or -full.

If you're not sure what the command name is, just use keywords and PowerShell will show you all the available commands that contain this keyword. Let's see what cmdlets related to CSV are available:

PS C:\Users\the_g> Get-Help -Category Cmdlet csv | select name

Name           
----           
ConvertFrom-Csv
ConvertTo-Csv  
Export-Csv     
Import-Csv

I created a little pipeline where I limited the Get-Help call only to the category Cmdlet and then piped it to the "select" (alias for Select-Object) to get only the "name" property.

Working With Files and Directories

You can do pretty much everything you're used to: navigating to various directories, listing files and sub-directories, examining the content of files, creating directories and files, etc.

PS C:\Users\the_g> mkdir test_dir | select name

Name    
----    
test_dir                                                                                                                        

PS C:\Users\the_g> cd .\test_dir

PS C:\Users\the_g\test_dir> "123" > test.txt

PS C:\Users\the_g\test_dir> ls | name Name                                                                                             ----                                                                                             test.txt                                                                                                                                                   

PS C:\Users\the_g\test_dir> get-content .\test.txt
123

Working With Other Providers

With PowerShell, you can treat many things as file systems and navigate them using cd and check their contents using ls/dir. Here are some additional providers:

Provider      Drive         Data store
--------      -----         ----------
Alias         Alias:        Windows PowerShell aliases

Certificate   Cert:         x509 certificates for digital signatures

Environment   Env:          Windows environment variables

Function      Function:     Windows PowerShell functions

Registry      HKLM:, HKCU:  Windows registry

Variable      Variable:     Windows PowerShell variables

WSMan         WSMan:        WS-Management configuration information

Let's check out the environment and see what Go-related environment variables are out there (on my machine):

PS C:\Users\the_g> ls env:GO*

Name   Value
----   -----
GOROOT C:\GO\                        
GOPATH C:\Users\the_g\Documents\Go

Formatting

PowerShell encourages composing cmdlets with standard switches and creating pipelines. Formatting is an explicit concept where in the end of a pipeline you put a formatter. PowerShell by default examines the type of object or objects at the end of the pipe and applies a default formatter. But you can override it by specifying a formatter yourself. Formatters are just cmdlets. Here is the previous output displayed in list format:

PS C:\Users\the_g> ls env:GO* | Format-List

Name  : GOROOT
Value : C:\Go\

Name  : GOPATH
Value : c:\Users\the_g\Documents\Go

The Profile

Power users that use the command line frequently have many tasks, pipelines, and favorite combinations of commands with default switches that they favor. The PowerShell profile is a PowerShell script file that is loaded and executed whenever you start a new session. You can put all your favorite goodies there, create aliases and functions, set environment variables, and pretty much everything else.

I like to create navigation aliases to deeply nested directories, activate Python virtual environments, and create shortcuts to external commands I run frequently, like git and docker.

For me, the profile is indispensable because PowerShell's very readable and consistent commands and switches are often too verbose, and the built-in aliases are often more trouble than help (I discuss this later). Here is a very partial snippet from my profile:

#---------------------------
#
#   D O C K E R
#
#---------------------------
Set-Alias -Name d -Value docker

function di { d images }
#---------------------------
#
#   G I T
#
#---------------------------
Set-Alias -Name g -Value git
function gs { g status }
function gpu { g pull --rebase }

#-------------------------
#
#   C O N D A
#
#-------------------------
function a { activate.ps1 $args[0] }

#------------------------
#
#   N A V I G A T I O N
#
#------------------------

function cdg { cd $github_dir }
# MVP 
function cdm { a ov; cdg; cd MVP }

# backend 
function cdb { a ov; cdg; cd backend }

# scratch
function cds { a ov; cdg; cd scratch }

# backend packages
function cdbp { cdb; cd packages }

# Go workspace
function cdgo { cd $go_src_dir }

The Prompt

PowerShell lets you customize your command prompt. You need to define a function called prompt(). You can see the built-in prompt function:

PS C:\Users\the_g> gc function:prompt

"PS $($executionContext.SessionState.Path.CurrentLocation)$('>' * ($nestedPromptLevel + 1)) ";
# .Link
# http://ift.tt/1t3sOF2
# .ExternalHelp System.Management.Automation.dll-help.xml


PS C:\Users\the_g>

Here is a custom prompt function that displays the current time in addition to the current directory:

PS C:\Users\the_g> function prompt {"$(get-date) $(get-location) > "}

10/09/2016 12:42:36 C:\Users\the_g >

You can go wild, of course, and add colors and check various conditions like if you're in a particular git repository or if you're admin.

Aliases: The Dark Side

PowerShell got aliases wrong, in my opinion, on two separate fronts. First, the alias command only allows the renaming of commands. You can't add common flags or options to make commands more useful by aliasing them to themselves.

For example, if you want to search in text line by line, you can use the Select-String cmdlet:

# Create a little text file with 3 lines
"@
ab
cd
ef
@" > 1.txt

# Search for a line containing d
Get-Content 1.txt | Select-String d 

cd

That works, but many people would like to rename Select-String to grep. But grep is by default case-sensitive, while Select-String is not. No big deal—we'll just add the -CaseSensitive flag, as in:

Set-Alias -Name grep -Value "Select-String -CaseSensitive"

Unfortunately, that doesn't work:

16:19:26 C:\Users\the_g> Get-Content 1.txt | grep D
grep : The term 'Select-String -CaseSensitive' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or
if a path was included, verify that the path is correct and try again.
At line:1 char:21
+ Get-Content 1.txt | grep D
+                     ~~~~
    + CategoryInfo          : ObjectNotFound: (Select-String -CaseSensitive:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

The value of an alias must be either a cmdlet, a function, a script, or a program. No flags or arguments are allowed.

Now, you can do that very easily in PowerShell, but you'll have to use functions and not aliases. That pretty much constrains aliases to simple renaming, which can also be done by functions.

PowerShell vs. Bash

On the interactive shell side, PowerShell and Bash are pretty equal. Bash is more concise by default, but PowerShell's object pipeline makes complicated pipelines more manageable. ,

That said, you can probably accomplish anything with either one and if you're a power user then you'll have your own aliases, functions, and shortcuts for common tasks. On the scripting side, PowerShell goes far beyond Bash, and for system administration purposes it even beats Python, Ruby and friends.

An important aspect is availability. Bash comes pre-installed with most *nix distributions (unless specifically stripped) including macOS. It can also be installed on Windows via cygwin, git-bash, or msys. PowerShell comes pre-installed on Windows and just recently became available on Mac and Linux.

Conclusion

If you use Windows as a development machine or if you manage Windows machines then PowerShell is an indispensable tool. It is truly a well thought out superset of the Unix shells, and it comes pre-installed.

PowerShell is great software engineering at work. It evolved over a decade, and it kept innovating while maintaining its original conceptual integrity. The recent switch to open source and cross-platform signals that there is still a lot more to wait for.

100 Insanely Awesome Fonts From Envato Elements

How To Take Photos of People Like a Professional

Saturday, December 17, 2016

The Portable Guitarist—Amps, Cables and Connectors

This series has so far explained the advantages of an iOS-based rig, the gear you need, transporting it, and setting it up for live performance.

This tutorial is about amplification options plus the relevant cables and connectors.

Mono In; Stereo Out

Look at the guitar lead—there’s usually a ring below the tip, indicating a MONO, or single signal lead.

Two rings equals stereo—you’ve got the wrong lead. This is because a guitar produces a mono analogue signal. Consequently, all of the traditional gear that you’d plug a guitar into—pedals, amps and so on—come equipped with mono input sockets.

The iPad’s headphone socket, however, is a STEREO, or dual signal output. Furthermore, the output accepts an 1/8” (3.5mm) jack, whereas traditional guitar inputs accept an 1/4” (6.25mm) jack.

Simply put, the iPad produces too big a signal on too small a connector.

Thankfully, this can be overcome with the correct connections and leads. To make the right choice, however, you’ll need to decide what you’re plugging into: guitar amp, or PA (Public Address system).

Guitar Amp

Unless you own a stereo amp—such as the Roland Jazz Chorus, or run a two-amp set-up—you’ve the aforementioned stereo-into-mono problem.

iPad to the Rescue

Luckily, an iPad can output in mono as well as stereo. To do this, go to Settings, and select General.

Choose Accessibility, and scroll down to Hearing. You’ll find a slide button that activates Mono output. Try it through a mono cable into a single speaker; you’ll notice a louder, more detailed sound than when it’s in stereo mode.

Talking of cables…

Cable Conundrum

Online you’ll find any number of Y-cables that combine two mono signals into a single stereo signal.

Search ‘stereo to mono cable’, however, and you’ll get a cable consisting of one stereo connection and two mono connections. You’ll struggle to find a cable that has a single connection at either end.

Thankfully, there’s some good news.

Adaptors

You can run a mono cable from a stereo output (the iPad) to a mono input (the amp). You’ll need a stereo 1/8” jack adaptor, however, on one end of the 1/4” guitar cable for the iPad’s headphone socket.

You can also use a 1/8” jack stereo cable, but you’ll need a stereo-to-mono 1/4” adaptor on one end to plug into the amp.

These adaptors cost just a few pounds, and are found easily on Amazon and eBay.

A robust—but more expensive—solution is the iLine Mono Output Adaptor from IK Multimedia, for under £25. For around £50, it comes as part of the larger and very useful iLine Mobile Music Cable Kit.

Front End, Effects Loop

If the amp has an effects loop, you could use the iPad purely as an effects unit using cables and connectors described above.

Some words of caution:

The amp’s overall output will be determined by the iPad, so you’ll need to turn that up
You’ll notice a level of noise that’s higher than when plugging into the front of the amp. You may not notice it when playing, but it’ll be there when you’re not. Whether you proceed will be determined by how loud the gig is, and how much noise you or your audience can take

I’d choose to plug into the front end, but this also presents issues:

Careful with the iPad’s volume; the louder you go, the harder you’ll drive the amp, which leads to distortion
Like any effects pedal, its position in the signal chain affects how it performs. An expansive reverb may sound fine on a clean sound, but could get messy with distortion

PA

When I started using an iPad live, I never used a guitar amp. Why use one amp, when apps provide the sounds of many? Instead, I used a PA. It presents some new challenges, but solves a lot of problems.

Stereo Compatibility

Unlike a guitar amp, a PA accepts an array of input sources and connections. Of interest here are line input jack sockets, which are typically stereo compatible.

You could therefore connect your iPad with a 1/8” to 1/4” stereo cable. These are plentiful, and can be very cheap. Get the longest one you can, as you never know from gig to gig how far apart your equipment could be.

Double Up

Some line sockets accept stereo or mono jacks, so I run a 1/8” stereo Y-cable that terminates with two 1/4” mono jacks.

Each of these plug into separate channels, as two preamps means more output. This lets me lower the iPad’s volume, giving a cleaner input signal.

Effects

Time-based effects like delay and reverb consume the iPad’s processing power. If your PA has inbuilt effects—that you like the sound of—employ them.

Mine or Yours

I arrange my PA like a traditional guitar amp stack, placed behind me. If you’re worried about getting sound out front, most PA units have a monitor output; simply run a lead from it to your band’s PA.

However, if you don’t own a PA, you’ll have to plug into your band’s one. If so, consider these points:

Your distance from the PA is the length of your lead
As cables lengthen, you run into capacitance issues, causing treble loss—the longer the cable, the more muffled you sound
If you can’t hear yourself then you need a powered monitor speaker

Conclusion

In the mono realm of guitarists, the stereo iPad can seem like a baffling choice, but you can make it work provided you:

Understand what’s mono and what’s stereo
Get the right cables and connectors
Choose your amplification wisely
A guitar amp’s front end is quieter than its effects loop
A PA has more options
Your own PA is easier than using your band’s

The next tutorial I'll explain the world of apps.

How to Use an iPhone Outdoors

How to Create a Space Scene With Element 3D

Friday, December 16, 2016

Get Started With Retrofit 2 HTTP Client

New Course: JavaScript for Web Designers

Illustrator in 60 Seconds: How to Install and Use a Custom Swatch Pattern

Installing the Google PageSpeed Module

The Beginner’s Guide to Unusual and Unique Printing Methods

Awesome Actions: How to Create an Oil Painting Photo Effect

Thursday, December 15, 2016

Why You Should Be Using Rem-Based Layouts

How to Create a Festive Mandala Style Coloring Book Page in Adobe Illustrator

The Best Photoshop (PSD) Website Templates of 2016

Creating a Custom WordPress Messaging System, Part 2

Which Type of Photography Portfolio Should You Use?

How to Create a Geometric Collage Text Effect in Adobe Photoshop

Wednesday, December 14, 2016

Practical Animation Examples in React Native

What is Passive Income and How Does it Actually Work?

New Course: Get Started With NativeScript and Mobile Angular 2

How to Organize Your Gmail Inbox to Be More Effective

Common React Native App Layouts: News Feed

Building Your First Web Scraper, Part 1

Rubyland has two gems that have occupied the web scraping spotlight for the past few years: Nokogiri and Mechanize. We spend an article on each of these before we put them into action with a practical example.

Topics

Web Scraping?
Permission
The Problem
Nokogiri
Extraction?
Pages
API
Node Navigation

Web Scraping?

There are fancier terms around than web or screen scraping. Web harvesting and web data extraction pretty much tell you right away what’s going on. We can automate the extraction of data from web pages—and it’s not that complicated as well.

In a way, these tools allow you to imitate and automate human web browsing. You write a program that only extracts the sort of data that is of interest to you. Targeting specific data is almost as easy as using CSS selectors.

A few years ago I subscribed to some online video course that had like a million short videos but no option to download them in bulk. I had to go through every link on my own and do the dreaded ‘save as’ myself. It was sort of human web scraping—something that we often need to do when we lack the knowledge to automate that kind of stuff. The course itself was alright, but I didn’t use their services anymore after that. It was just too tedious.

Today, I wouldn’t care too much about such mind-melting UX. A scraper that would do the downloading for me would take me only a couple of minutes to throw together. No biggie!

Let me break it down real quick before we start. The whole thing can be condensed into a couple of steps. First we fetch a web page that has the desired data we need. Then we search through that page and identify the information we want to extract.

The final step is to target these bits, slice them if necessary, and decide how and where you want to store them. Well-written HTML is often key to making this process easy and enjoyable. For more involved extractions, it can be a pain if you have to deal with poorly structured markup.

What about APIs? Very good question. If you have access to a service with an API, there is often little need to write your own scraper. This approach is mostly for websites that don’t offer that sort of convenience. Without an API, this is often the only way to automate the extraction of information from websites.

You might ask, how does this scraping thing actually work? Without jumping into the deep end, the short answer is, by traversing tree data structures. Nokogiri builds these data structures from the documents you feed it and lets you target bits of interest for extraction. For example, CSS is a language written for tree traversal, for searching tree data structures, and we can make use of it for data extraction.

There are many approaches and solutions out there to play with. Rubyland has two gems that have occupied the spotlight for a number of years now. Many people still rely on Nokogiri and Mechanize for HTML scraping needs. Both have been tested and proven themselves to be easy to use while being highly capable. We will look at both of them. But before that, I’d like to take a moment to address the problem that we are going to solve at the end of this short introductory series.

Permission

Before you start scraping away, make sure you have the permission of the sites you are trying to access for data extraction. If the site has an API or RSS feed, for example, it might not only be easier to get that desired content, it might also be the legal option of choice.

Not everybody will appreciate it if you do massive scraping on their sites—understandably so. Get yourself educated on that particular site you are interested in, and don’t get yourself in trouble. Chances are low that you will inflict serious damage, but risking trouble unknowingly is not the way to go.

The Problem

I needed to build a new podcast. The design was not where I wanted it to be, and I hated the way of publishing new posts. Damn WYSIWYGs! A little bit of context. About two years ago, I built the first version of my podcast. The idea was to play with Sinatra and build something super lightweight. I ran into a couple of unexpected issues since I tailor-made pretty much everything.

Coming from Rails, it was definitely an educational journey that I appreciate, but I quickly regretted not having used a static site that I could have deployed through GitHub via GitHub pages. Deploying new episodes and maintaining them lacked the simplicity that I was looking for. For a while, I decided that I had bigger fish to fry and focused on producing new podcast material instead.

This past summer I started to get serious and worked on a Middleman site that is hosted via GitHub pages. For season two of the show, I wanted something fresh. A new, simplified design, Markdown for publishing new episodes, and no fist fights with Heroku—heaven! The thing was that I had 139 episodes lying around that needed to be imported and converted first in order to work with Middleman.

For posts, Middleman uses .markdown files that have so called frontmatter for data—which replaces my database basically. Doing this transfer by hand is not an option for 139 episodes. That’s what computation is for. I needed to figure out a way to parse the HTML of my old website, scrape the relevant content, and transfer it to blog posts that I use for publishing new podcast episodes on Middleman.

Therefore, over the next three articles, I’m going to introduce you to the tools commonly used in Rubyland for such tasks. In the end, we’ll go over my solution to show you something practical as well.

Nokogiri

Even if you are completely new to Ruby/Rails, chances are very good that you have already heard about this little gem. The name is dropped often and sticks with you easily. I'm not sure that many know that nokogiri is Japanese for “saw”.

It's a fitting name once you understand what the tool does. The creator of this gem is the lovely Tenderlove, Aaron Patterson. Nokogiri converts XML and HTML documents into a data structure—a tree data structure, to be more precise. The tool is fast and offers a nice interface as well. Overall, it’s a very potent library that takes care of a multitude of your HTML scraping needs.

You can use Nokogiri not only for parsing HTML; XML is fair game as well. It gives you the options of both XML path language and CSS interfaces to traverse the documents you load. XML path Language, or XPath for short, is a query language.

It allows us to select nodes from XML documents. CSS selectors are most likely more familiar to beginners. As with styles you write, CSS selectors make it fantastically easy to target specific sections of pages that are of interest for extraction. You just need to let Nokogiri know what you are after when you target a particular destination.

Pages

What we always need to start with is fetching the actual page we are interested in. We specify what kind of Nokogiri document we want to parse—XML or HTML for example:

Nokogiri::XML

Nokogiri::HTML

some_scraper.rb

require "nokogiri"

require "open-uri"

page = Nokogiri::XML(File.open("some.xml"))

page = Nokogiri::HTML(File.open("some.html"))

Nokogiri:XML and Nokogiri:HTML can take IO objects or String objects. What happens above is straightforward. This opens and fetches the designated page using open-uri and then loads its structure, its XML or HTML into a new Nokogiri document. XML is not something beginners have to deal with very often.

Therefore, I’d recommend that we focus on HTML parsing for now. Why open-uri? This module from the Ruby Standard Library lets us grab the site without much fuss. Because IO objects are fair game, we can make easy use of open-uri.

API

Let’s put this into practice with a mini example:

at_css

some_podcast_scraper.rb

require 'nokogiri'

require "open-uri"

url = 'http://ift.tt/1Eqv5Ua'

page = Nokogiri::HTML(open(url))

header = page.at_css("h2.post-title")

title = header.text

puts "This is the raw header of the latest episode: #{header}"

puts "This is the title of the latest episode: #{title}"

What we did here represents all the steps that are usually involved with web scraping—just at a micro level. We decide which URL we need and which site we need to fetch, and we load them into a new Nokogiri document. Then we open that page and target a specific section.

Here I only wanted to know the title of the latest episode. Using the at_css method and a CSS selector for h2.post-title was all I needed to target the extraction point. With this method we will only scrape this singular element, though. This gives us the whole selector—which is most of the time not exactly what we need. Therefore we extract only the inner text portion of this node via the text method. For comparison, you can check the output for both the header and the text below.

Output

This is the raw title of the latest episode: <h2 class="post-title"><a href="episodes/142/">David Heinemeier Hansson</a></h2>

This is the title of the latest episode: David Heinemeier Hansson

Although this example has very limited applications, it possesses all the ingredients, all the steps that you need to understand. I think it’s cool how simple this is. Because it might not be obvious from this example, I would like to point out how powerful this tool can be. Let’s see what else we can do with a Nokogiri script.

Attention!

If you are a beginner and not sure how to target the HTML needed for this, I recommend that you search online to find out how to inspect the contents of websites in your browser. Basically, all major browsers make this process really easy these days.

On Chrome you just need to right-click on an element in the website and choose the inspect option. This will open a small window at the bottom of your browser which shows you something like an x-ray of the site’s DOM. It has many more options, and I would recommend spending some time on Google to educate yourself. This is time spent wisely!

css

The css method will give us not only a single element of choice but any element that matches the search criteria on the page. Pretty neat and straightforward!

some_scraper.rb

require 'nokogiri'

require "open-uri"

url = 'http://ift.tt/1Eqv5Ua'

page = Nokogiri::HTML(open(url))

headers = page.css("h2.post-title")

headers.each do |header|
  puts "This is the raw title of the latest episode: #{header}"
end

headers.each do |header|
  puts "This is the title of the latest episode: #{header.text}"
end

Output

This is the raw title of the latest episode: <h2 class="post-title"><a href="episodes/142/">David Heinemeier Hansson</a></h2>
This is the raw title of the latest episode: <h2 class="post-title"><a href="episodes/141/">Zach Holman</a></h2>
This is the raw title of the latest episode: <h2 class="post-title"><a href="episodes/140/">Joel Glovier</a></h2>
This is the raw title of the latest episode: <h2 class="post-title"><a href="episodes/139/">João Ferreira</a></h2>
This is the raw title of the latest episode: <h2 class="post-title"><a href="episodes/138/">Corwin Harrell</a></h2>
This is the raw title of the latest episode: <h2 class="post-title"><a href="episodes/137/">Roberto Machado</a></h2>
This is the raw title of the latest episode: <h2 class="post-title"><a href="episodes/136/">James Edward Gray II</a></h2>

This is the title of the latest episode: David Heinemeier Hansson
This is the title of the latest episode: Zach Holman
This is the title of the latest episode: Joel Glovier
This is the title of the latest episode: João Ferreira
This is the title of the latest episode: Corwin Harrell
This is the title of the latest episode: Roberto Machado
This is the title of the latest episode: James Edward Gray II

The only little difference in this example is that I iterate on the raw headers first. I also extracted its inner text with the text method. Nokogiri automatically stops at the end of the page and does not attempt to follow the pagination anywhere automatically.

Let’s say we want to have a bit more information, say the date and the subtitle for each episode. We can simply expand on the example above. It is a good idea anyway to take this step by step. Get a little piece working and add in more complexity along the way.

some_scraper.rb

require 'nokogiri'

require "open-uri"

url = 'http://ift.tt/1Eqv5Ua'

page = Nokogiri::HTML(open(url))

articles = page.css("article.index-article")

articles.each do |article|
  header     = article.at_css("h2.post-title")
  date       = article.at_css(".post-date")
  subtitle   = article.at_css(".topic-list")

  puts "This is the raw header:    #{header}"
  puts "This is the raw date:      #{date}"
  puts "This is the raw subtitle:  #{subtitle}\n\n"
 
  puts "This is the text header:   #{header.text}"
  puts "This is the text date:     #{date.text}"
  puts "This is the text subtitle: #{subtitle.text}\n\n"
end

Output

This is the raw header: <h2 class="post-title"><a href="episodes/142/">David Heinemeier Hansson</a></h2>
This is the raw date: <span class="post-date">Oct 18 | 2016</span>
This is the raw subtitle: <h3 class="topic-list">Rails community | Tone | Technical disagreements | Community policing | Ungratefulness | No assholes allowed | Basecamp | Open source persona | Aspirations | Guarding motivations | Dealing with audiences | Pressure | Honesty | Diverse opinions | Small talk</h3>

This is the text header: David Heinemeier Hansson
This is the text date: Oct 18 | 2016
This is the text subtitle: Rails community | Tone | Technical disagreements | Community policing | Ungratefulness | No assholes allowed | Basecamp | Open source persona | Aspirations | Guarding motivations | Dealing with audiences | Pressure | Honesty | Diverse opinions | Small talk

This is the raw header: <h2 class="post-title"><a href="episodes/141/">Zach Holman</a></h2>
This is the raw date: <span class="post-date">Oct 12 | 2016</span>
This is the raw subtitle: <h3 class="topic-list">Getting Fired | Taboo | Transparency | Different Perspectives | Timing | Growth Stages | Employment &amp; Dating | Managers | At-will Employment | Tech Industry | Europe | Low hanging Fruits | Performance Improvement Plans | Meeting Goals | Surprise Firings | Firing Fast | Mistakes | Company Culture | Communication</h3>

This is the text header: Zach Holman
This is the text date: Oct 12 | 2016
This is the text subtitle: Getting Fired | Taboo | Transparency | Different Perspectives | Timing | Growth Stages | Employment & Dating | Managers | At-will Employment | Tech Industry | Europe | Low hanging Fruits | Performance Improvement Plans | Meeting Goals | Surprise Firings | Firing Fast | Mistakes | Company Culture | Communication

This is the raw header: <h2 class="post-title"><a href="episodes/140/">Joel Glovier</a></h2>
This is the raw date: <span class="post-date">Oct 10 | 2016</span>
This is the raw subtitle: <h3 class="topic-list">Digital Product Design | Product Design @ GitHub | Loving Design | Order &amp; Chaos | Drawing | Web Design | HospitalRun | Diversity | Startup Culture | Improving Lives | CURE International | Ember | Offline First | Hospital Information System | Designers &amp; Open Source</h3>

This is the text header: Joel Glovier
This is the text date: Oct 10 | 2016
This is the text subtitle: Digital Product Design | Product Design @ GitHub | Loving Design | Order & Chaos | Drawing | Web Design | HospitalRun | Diversity | Startup Culture | Improving Lives | CURE International | Ember | Offline First | Hospital Information System | Designers & Open Source

This is the raw header: <h2 class="post-title"><a href="episodes/139/">João Ferreira</a></h2>
This is the raw date: <span class="post-date">Aug 26 | 2015</span>
This is the raw subtitle: <h3 class="topic-list">Masters @ Work | Subvisual | Deadlines | Design personality | Design problems | Team | Pushing envelopes | Delightful experiences | Perfecting details | Company values</h3>

This is the text header: João Ferreira
This is the text date: Aug 26 | 2015
This is the text subtitle: Masters @ Work | Subvisual | Deadlines | Design personality | Design problems | Team | Pushing envelopes | Delightful experiences | Perfecting details | Company values

This is the raw header: <h2 class="post-title"><a href="episodes/138/">Corwin Harrell</a></h2>
This is the raw date: <span class="post-date">Aug 06 | 2015</span>
This is the raw subtitle: <h3 class="topic-list">Q&amp;A | 01 | University | Graphic design | Design setup | Sublime | Atom | thoughtbot | Working location | Collaboration &amp; pairing | Vim advocates | Daily routine | Standups | Clients | Coffee walks | Investment Fridays |</h3>

This is the text header: Corwin Harrell
This is the text date: Aug 06 | 2015
This is the text subtitle: Q&A | 01 | University | Graphic design | Design setup | Sublime | Atom | thoughtbot | Working location | Collaboration & pairing | Vim advocates | Daily routine | Standups | Clients | Coffee walks | Investment Fridays |

This is the raw header: <h2 class="post-title"><a href="episodes/137/">Roberto Machado</a></h2>
This is the raw date: <span class="post-date">Aug 03 | 2015</span>
This is the raw subtitle: <h3 class="topic-list">CEO @ Subvisual | RubyConf Portugal | Creators School | Consultancy | Company role models | Group Buddies | Portuguese startup | Rebranding | Technologies used | JS frameworks | TDD &amp; BDD | Startup mistakes | Culture of learning | Young entrepreneurs</h3>

This is the text header: Roberto Machado
This is the text date: Aug 03 | 2015
This is the text subtitle: CEO @ Subvisual | RubyConf Portugal | Creators School | Consultancy | Company role models | Group Buddies | Portuguese startup | Rebranding | Technologies used | JS frameworks | TDD & BDD | Startup mistakes | Culture of learning | Young entrepreneurs

This is the raw header: <h2 class="post-title"><a href="episodes/136/">James Edward Gray II</a></h2>
This is the raw date: <span class="post-date">Jul 30 | 2015</span>
This is the raw subtitle: <h3 class="topic-list">Screencasting | Less Code | Reading code | Getting unstuck | Rails’s codebase | CodeNewbie | Small examples | Future plans | PeepCode | Frequency &amp; pricing</h3>

This is the text header: James Edward Gray II
This is the text date: Jul 30 | 2015
This is the text subtitle: Screencasting | Less Code | Reading code | Getting unstuck | Rails’s codebase | CodeNewbie | Small examples | Future plans | PeepCode | Frequency & pricing

At this point, we already have some data to play with. We can structure or butcher it any way we like. The above should simply show what we have in a readable fashion. Of course we can dig deeper into each of these by using regular expressions with the text method.

We will look into this in a lot more in detail when we get to solving the actual podcast problem. It won’t be a class on regexp, but you will see some more of it in action—but no worries, not so much as to make your brain bleed.

Attributes

What could be handy at this stage is extracting the href for the individual episodes as well. It couldn’t be any simpler.

some_scraper.rb

require 'nokogiri'

require "open-uri"

url = 'http://ift.tt/1Eqv5Ua'

page = Nokogiri::HTML(open(url))

articles = page.css("article.index-article")

articles.each do |article|
  header      = article.at_css("h2.post-title")
  date        = article.at_css(".post-date")
  subtitle    = article.at_css(".topic-list")
  link        = article.at_css("h2.post-title a")
  podcast_url = "http://ift.tt/1Eqv5Ua"

  puts "This is the raw header:    #{header}"
  puts "This is the raw date:      #{date}"
  puts "This is the raw subtitle:  #{subtitle}"
  puts "This is the raw link:      #{link}\n\n"

  puts "This is the text header:   #{header.text}"
  puts "This is the text date:     #{date.text}"
  puts "This is the text subtitle: #{subtitle.text}"
  puts "This is the raw link:      #{podcast_url}#{link[:href]}\n\n"
end

The most important bits to pay attention to here are [:href] and podcast_url. If you tag on [:] you can simply extract an attribute from the targeted selector. I abstracted a little further, but you can see more clearly how it works below.

...

href = article.at_css("h2.post-title a")[:href]

...

To get a complete and useful URL, I saved the root domain in a variable and constructed the full URL for each episode.

...

podcast_url = "http://ift.tt/1Eqv5Ua"

puts "This is the raw link: #{podcast_url}#{link[:href]}\n\n"

...

Let’s take a quick look at the output:

Output

This is the raw header:   <h2 class="post-title"><a href="episodes/143/">Jason Long</a></h2>
This is the raw date:     <span class="post-date">Oct 25 | 2016</span>
This is the raw subtitle: <h3 class="topic-list">Open source | Empathy | Lower barriers | Learning tool | Design contributions | Git website | Branding | GitHub | Neovim | Tmux | Design love | Knowing audiences | Showing work | Dribbble | Progressions | Ideas</h3>
This is the raw link:     <a href="episodes/143/">Jason Long</a>

This is the text header: Jason Long
This is the text date:   Oct 25 | 2016
This is the text subtitle: Open source | Empathy | Lower barriers | Learning tool | Design contributions | Git website | Branding | GitHub | Neovim | Tmux | Design love | Knowing audiences | Showing work | Dribbble | Progressions | Ideas
This is the href:     http://ift.tt/2gA5f61

This is the raw header:   <h2 class="post-title"><a href="episodes/142/">David Heinemeier Hansson</a></h2>
This is the raw date:     <span class="post-date">Oct 18 | 2016</span>
This is the raw subtitle: <h3 class="topic-list">Rails community | Tone | Technical disagreements | Community policing | Ungratefulness | No assholes allowed | Basecamp | Open source persona | Aspirations | Guarding motivations | Dealing with audiences | Pressure | Honesty | Diverse opinions | Small talk</h3>
This is the raw link:     <a href="episodes/142/">David Heinemeier Hansson</a>

This is the text header: David Heinemeier Hansson
This is the text date:   Oct 18 | 2016
This is the text subtitle: Rails community | Tone | Technical disagreements | Community policing | Ungratefulness | No assholes allowed | Basecamp | Open source persona | Aspirations | Guarding motivations | Dealing with audiences | Pressure | Honesty | Diverse opinions | Small talk
This is the href:     http://ift.tt/2hEll3v

This is the raw header:   <h2 class="post-title"><a href="episodes/141/">Zach Holman</a></h2>
This is the raw date:     <span class="post-date">Oct 12 | 2016</span>
This is the raw subtitle: <h3 class="topic-list">Getting Fired | Taboo | Transparency | Different Perspectives | Timing | Growth Stages | Employment &amp; Dating | Managers | At-will Employment | Tech Industry | Europe | Low hanging Fruits | Performance Improvement Plans | Meeting Goals | Surprise Firings | Firing Fast | Mistakes | Company Culture | Communication</h3>
This is the raw link:     <a href="episodes/141/">Zach Holman</a>

This is the text header: Zach Holman
This is the text date:   Oct 12 | 2016
This is the text subtitle: Getting Fired | Taboo | Transparency | Different Perspectives | Timing | Growth Stages | Employment & Dating | Managers | At-will Employment | Tech Industry | Europe | Low hanging Fruits | Performance Improvement Plans | Meeting Goals | Surprise Firings | Firing Fast | Mistakes | Company Culture | Communication
This is the href:     http://ift.tt/2dZ8mqu

This is the raw header:   <h2 class="post-title"><a href="episodes/140/">Joel Glovier</a></h2>
This is the raw date:     <span class="post-date">Oct 10 | 2016</span>
This is the raw subtitle: <h3 class="topic-list">Digital Product Design | Product Design @ GitHub | Loving Design | Order &amp; Chaos | Drawing | Web Design | HospitalRun | Diversity | Startup Culture | Improving Lives | CURE International | Ember | Offline First | Hospital Information System | Designers &amp; Open Source</h3>
This is the raw link:     <a href="episodes/140/">Joel Glovier</a>

This is the text header: Joel Glovier
This is the text date:   Oct 10 | 2016
This is the text subtitle: Digital Product Design | Product Design @ GitHub | Loving Design | Order & Chaos | Drawing | Web Design | HospitalRun | Diversity | Startup Culture | Improving Lives | CURE International | Ember | Offline First | Hospital Information System | Designers & Open Source
This is the href:     http://ift.tt/2hEsTmM

This is the raw header:   <h2 class="post-title"><a href="episodes/139/">João Ferreira</a></h2>
This is the raw date:     <span class="post-date">Aug 26 | 2015</span>
This is the raw subtitle: <h3 class="topic-list">Masters @ Work | Subvisual | Deadlines | Design personality | Design problems | Team | Pushing envelopes | Delightful experiences | Perfecting details | Company values</h3>
This is the raw link:     <a href="episodes/139/">João Ferreira</a>

This is the text header: João Ferreira
This is the text date:   Aug 26 | 2015
This is the text subtitle: Masters @ Work | Subvisual | Deadlines | Design personality | Design problems | Team | Pushing envelopes | Delightful experiences | Perfecting details | Company values
This is the href:     http://ift.tt/2gA8Lgr

This is the raw header:   <h2 class="post-title"><a href="episodes/138/">Corwin Harrell</a></h2>
This is the raw date:     <span class="post-date">Aug 06 | 2015</span>
This is the raw subtitle: <h3 class="topic-list">Q&amp;A | 01 | University | Graphic design | Design setup | Sublime | Atom | thoughtbot | Working location | Collaboration &amp; pairing | Vim advocates | Daily routine | Standups | Clients | Coffee walks | Investment Fridays |</h3>
This is the raw link:     <a href="episodes/138/">Corwin Harrell</a>

This is the text header: Corwin Harrell
This is the text date:   Aug 06 | 2015
This is the text subtitle: Q&A | 01 | University | Graphic design | Design setup | Sublime | Atom | thoughtbot | Working location | Collaboration & pairing | Vim advocates | Daily routine | Standups | Clients | Coffee walks | Investment Fridays |
This is the href:     http://ift.tt/2hEnDQ1

This is the raw header:   <h2 class="post-title"><a href="episodes/137/">Roberto Machado</a></h2>
This is the raw date:     <span class="post-date">Aug 03 | 2015</span>
This is the raw subtitle: <h3 class="topic-list">CEO @ Subvisual | RubyConf Portugal | Creators School | Consultancy | Company role models | Group Buddies | Portuguese startup | Rebranding | Technologies used | JS frameworks | TDD &amp; BDD | Startup mistakes | Culture of learning | Young entrepreneurs</h3>
This is the raw link:     <a href="episodes/137/">Roberto Machado</a>

This is the text header: Roberto Machado
This is the text date:   Aug 03 | 2015
This is the text subtitle: CEO @ Subvisual | RubyConf Portugal | Creators School | Consultancy | Company role models | Group Buddies | Portuguese startup | Rebranding | Technologies used | JS frameworks | TDD & BDD | Startup mistakes | Culture of learning | Young entrepreneurs
This is the href:     http://ift.tt/2gA6bHg

Neat, isn’t it? You can do the same to extract the [:class] of a selector.

require 'nokogiri'

require "open-uri"

url = 'http://ift.tt/1Eqv5Ua'

page = Nokogiri::HTML(open(url))

body_classes = page.at_css("body")[:class]

If that node has more than one class, you will get a list of all of them.

Node Navigation

parent
children
previous_sibling
next_sibling

We are used to dealing with tree structures in CSS or even jQuery. It would be a pain if Nokogiri didn't offer a handy API to move within such trees.

some_scraper.rb

require 'nokogiri'

require "open-uri"

url = 'http://ift.tt/1Eqv5Ua'

page = Nokogiri::HTML(open(url))

header = page.at_css("h2.post-title")
header_children = page.at_css("h2.post-title").children
header_parent = page.at_css("h2.post-title").parent
header_prev_sibling = page.at_css("h2.post-title").previous_sibling

puts "#{header}\n\n"
puts "#{header_children}\n\n"
puts "#{header_parent}\n\n"
puts "#{header_prev_sibling}\n\n"

Output

#header
<h2 class="post-title"><a href="episodes/143/">Jason Long</a></h2>

#header_children
<a href="episodes/143/">Jason Long</a>

#header_parent
<article class="index-article">
  <span class="post-date">Oct 25 | 2016</span><h2 class="post-title"><a href="episodes/143/">Jason Long</a></h2>
    <h3 class="topic-list">Open source | Empathy | Lower barriers | Learning tool | Design contributions | Git website | Branding | GitHub | Neovim | Tmux | Design love | Knowing audiences | Showing work | Dribbble | Progressions | Ideas</h3>
    <div class="soundcloud-player-small">  
    </div>
</article>

#header_previous_sibling
<span class="post-date">Oct 25 | 2016</span>

As you can see for yourself, this is some pretty powerful stuff—especially when you see what .parent was able to collect in one go. Instead of defining a bunch of nodes by hand, you could collect them wholesale.

You can even chain them for more involved traversals. You can take this as complicated as you like, of course, but I would caution you to keep things simple. It can quickly get a little unwieldy and hard to understand. Remember, "Keep it simple, stupid!"

...

header_parent_parent = page.at_css("h2.post-title").parent.parent
header_prev_sibling_parent_children = page.at_css("h2.post-title").previous_sibling.parent.children

...

some_scraper.rb

require 'nokogiri'

require "open-uri"

url = 'http://ift.tt/1Eqv5Ua'

page = Nokogiri::HTML(open(url))

header = page.at_css("h2.post-title")
header_prev_sibling_children = page.at_css("h2.post-title").previous_sibling.children
header_parent_parent = page.at_css("h2.post-title").parent.parent
header_prev_sibling_parent = page.at_css("h2.post-title").previous_sibling.parent
header_prev_sibling_parent_children = page.at_css("h2.post-title").previous_sibling.parent.children

puts "#{header}\n\n"
puts "#{header_prev_sibling_children}\n\n"
puts "#{header_parent_parent}\n\n"
puts "#{header_prev_sibling_parent}\n\n"
puts "#{header_prev_sibling_parent_children}\n\n"

Output

#header
<h2 class="post-title"><a href="episodes/143/">Jason Long</a></h2>

#header_previous_sibling_children
Oct 25 | 2016

#header_parent_parent
<li>
  <article class="index-article">
  <span class="post-date">Oct 25 | 2016</span><h2 class="post-title"><a href="episodes/143/">Jason Long</a></h2>
    <h3 class="topic-list">Open source | Empathy | Lower barriers | Learning tool | Design contributions | Git website | Branding | GitHub | Neovim | Tmux | Design love | Knowing audiences | Showing work | Dribbble | Progressions | Ideas</h3>
    <div class="soundcloud-player-small">  
    </div>
  </article>
</li>

#header_previous_sibling_parent
<article class="index-article">
  <span class="post-date">Oct 25 | 2016</span><h2 class="post-title"><a href="episodes/143/">Jason Long</a></h2>
    <h3 class="topic-list">Open source | Empathy | Lower barriers | Learning tool | Design contributions | Git website | Branding | GitHub | Neovim | Tmux | Design love | Knowing audiences | Showing work | Dribbble | Progressions | Ideas</h3>
    <div class="soundcloud-player-small">  
    </div>
</article>

#header_previous_sibling_parent_children
  <span class="post-date">Oct 25 | 2016</span><h2 class="post-title"><a href="episodes/143/">Jason Long</a></h2>
    <h3 class="topic-list">Open source | Empathy | Lower barriers | Learning tool | Design contributions | Git website | Branding | GitHub | Neovim | Tmux | Design love | Knowing audiences | Showing work | Dribbble | Progressions | Ideas</h3>
    <div class="soundcloud-player-small">  
    </div>

Final Thoughts

Nokogiri is not a huge library, but it has a lot to offer. I recommend you play with what you have learned thus far and expand your knowledge through its documentation when you hit a wall. But don’t get yourself into trouble!

This little intro should get you well on your way to understanding what you can do and how it works. I hope you will explore it a bit more on your own and have some fun with it. As you will find out on your own, it’s a rich tool that keeps on giving.