This site may be hacked

I recently helped some friends out with their WordPress-based website. Unfortunately, Google started listing their website in search results with the disclaimer "This site might be hacked".

First I had a quick look around: thankfully all the software was up-to-date and the installation itself was auto-updating. There were no unexpected users, plugins, or other content. Nothing was on fire.

Google provided all the information I needed to get this fixed:

  1. Claim the website on its Search Console. As simple as hosting an identifying file on the site and clicking a link.
  2. Review their findings. Google provided 4 links into my friends' site, pointing them out as having injected spam on them.
  3. Clean it up.
  4. Request a review.

Clean up

The website had suffered some HTML injection, absolutely positioned blocks of links to online sellers of junk. The blocks were not visible to a visual user agent, but would should up in a screen reader or to a spider.

I did some analysis of content in the database and found around 10% of content contained spam of this style. Based on a careful review of the content I created an SQL query to select all the effected posts and wrote a python script to do the clean up.

One interesting part of the work, I found several spam blocks that had themselves been injected with spam. Rather than writing a more complex matching expression, I simply ran the script twice.

Some types of attacks change existing source files on the server or create new files that look like they belong. It is a good idea to reinstall all software in an empty directory, not just re-install over existing files.

Reassurance

I found in my analysis that the injection attacks on this website had occurred in 2014. It appears no additional compromises of this nature have happened since. It is nice to see vulnerabilities fixed and good hygiene having an effect.

Review

After my work I reviewed the website. Additionally I specifically checked the 4 URLs suggested by google and saw they were as they should be.

I returned to the Google Search Console and requested a review of the website. While the web app suggested this review might be weeks away, I was pleased to received an email less than a week later, we're back in the clear.

Script

Here's the script I created to clean up the database. I was limited by what software was installed on the hosting service (Python 2.6 with an equally ancient database adapter) but it got the job done.

You'll not that I'm passing a function to the Regular Expression substitute function, re.sub, this instructs re.sub to visit all instances of the regular expression match. When you provide a string it only acts on the first match.

Importantly, never do anything like this without backing up your database:

 mysqldump --add-drop-table -u [database user] -p [database name] > 170221-my-backup.sql

...and knowing how to restore it.

import re  
import MySQLdb

def deletor(matchobj):  
    return ''

def remove_spam(post_content):  
    spam = re.compile = '<div style="position.*</div>'
    return re.sub(spam, deletor, post_content)

def clean_posts():  
    cleaned_posts = []
    db = MySQLdb.connect(user="", passwd="", db="")
    cur = db.cursor()

    sql = '''
     SELECT id, post_content FROM wp_posts WHERE post_content LIKE '%div style="position%';
     '''
    cur.execute(sql)
    posts = cur.fetchall()
    print("Found ", len(posts), " rows.")

    for row in posts:
        cleaned = (remove_spam(row[1]), int(row[0]))
        cleaned_posts.append(cleaned)

    sql = '''
    UPDATE wp_posts SET post_content = %s WHERE id = %s
    '''
    cur.executemany(sql, cleaned_posts)

if __name__ == '__main__':  
    clean_posts()