The Best Ways to Fix HTML Content Issues during Migration in Drupal 8

KrishaWeb
3 min readNov 6, 2017

--

In this blog post I will show you a technique to take care of HTML problems, import photos or perform content operations during migrations.

We need to fix source content prior to a lot of content migrations. This could be testing if there are lots of entries in the source database. The effective Drupal 8 Migration API gives sophisticated means to fix this sort of issue.

To solve HTML problems, I constantly used procedure plugin. Right here is an example just how you would call your own procedure plugin to take care of HTML concerns in the body field:

'field_body/value':
-
plugin: fix_html_issues
images_source: '/minnur/www/source-images'
images_destination: 'public://body-images/'
source: post_content
-
plugin: skip_on_empty
method: row

As you could see, I am piling up several process plugins for field_body/ value field migration. You could also pass custom parameters to your procedure plugin (in my instance, params are: images_source and images_destination ). You might add any variety of process plugins depending on your requirements.

Currently let’s watch the plugin code. Please note every one of the procedure plugins are stored in the src/Plugin/migrate/ process directory site in your migration module.

The plugin imports images into Drupal as media entities as well as changes <img> tags with Drupal entity installed tags <drupal-entity data-embed-button=”embed_image”></drupal-entity>. Below is the source code of the plugin:

<?php

namespace Drupal\wp_migration\Plugin\migrate\process;

use Drupal\migrate\ProcessPluginBase;
use Drupal\migrate\MigrateExecutableInterface;
use Drupal\file\FileInterface;
use Drupal\migrate\Row;
use Drupal\media_entity\Entity\Media;
use Drupal\Core\Database\Database;
use Drupal\Component\Utility\Unicode;

/**
* @MigrateProcessPlugin(
* id = "fix_html_issues"
* )
*/
class FixHTMLissues extends ProcessPluginBase {

/**
* {@inheritdoc}
*/
public function transform($html, MigrateExecutableInterface $migrate_executable, Row $row, $destination_property) {

// Values for the following variables are specified in the YAML file above.
$images_source = $this->configuration['images_source'];
$destination = $this->configuration['images_destination'];

preg_match_all('/<img[^>]+>/i', $html, $result);

if (!empty($result[0])) {

foreach ($result as $img_tags) {
foreach ($img_tags as $img_tag) {

preg_match_all('/(alt|title|src)=("[^"]*")/i', $img_tag, $tag_attributes);

$filepath = str_replace('"', '', $tag_attributes[2][1]);

if (!empty($tag_attributes[2][1])) {

// Create file object from a locally copied file.
$filename = basename($filepath);

if (file_prepare_directory($destination, FILE_CREATE_DIRECTORY)) {

if (filter_var($filepath, FILTER_VALIDATE_URL)) {
$file_contents = file_get_contents($filepath);
}
else {
$file_contents = file_get_contents($images_source . $filepath);
}
$new_destination = $destination . '/' . $row->getSourceProperty('id') . '-' . $filename;

if (!empty($file_contents)) {

if ($file = file_save_data($file_contents, $new_destination, FILE_EXISTS_REPLACE)) {

// Create media entity using saved file.
$media = Media::create([
'bundle' => 'image',
'uid' => \Drupal::currentUser()->id(),
'langcode' => \Drupal::languageManager()->getDefaultLanguage()->getId(),
'status' => Media::PUBLISHED,
'field_image' => [
'target_id' => $file->id(),
'alt' => !empty($tag_attributes[2][0]) ? Unicode::truncate(str_replace('"', '', $tag_attributes[2][0]), 512) : '',
'title' => !empty($tag_attributes[2][0]) ? Unicode::truncate(str_replace('"', '', $tag_attributes[2][0]), 1024) : '',
],
]);

$media->save();
$uuid = $this->getMediaUuid($file);
$html = str_replace($img_tag, '<p><drupal-entity
data-embed-button="embed_image"
data-entity-embed-display="entity_reference:media_thumbnail"
data-entity-embed-display-settings="{"image_style":"large","image_link":""}"
data-entity-type="media"
data-entity-uuid="' . $uuid . '"></drupal-entity>></p>', $html);
}

}

}
}
}
}
}
return $html;
}

/**
* Get Media UUID by File ID.
*/
protected function getMediaUuid(FileInterface $file) {
$query = db_select('media__field_image', 'f', ['target' => 'default']);
$query->innerJoin('media', 'm', 'm.mid = f.entity_id');
$query->fields('m', ['uuid']);
$query->condition('f.field_image_target_id', $file->id());
$uuid = $query->execute()->fetchField();
return $uuid;
}

}

The procedure plugin code can obtain really unpleasant, that’s fine. Given that this can be simply a tiny section of the total migration, you don’t want spend time making it look nice and optimized. The most effective way to improve your code is to create more migrations and optimize it in time.

I hope this was helpful and I would love to hear about your methods and options for content migration problems.

--

--

KrishaWeb
KrishaWeb

Written by KrishaWeb

A Full-Service Digital Agency offering Web Design, UI UX Design, Open Source Development, Framework Development, and Digital Marketing to global clients.

No responses yet