No-code backlink data collection
Introduction
This tutorial explains how to create a backlink profile for each domain selected for the core website dataset. For this pilot report, we used 42 domains. This process must be completed for each of the 42 websites.
Depending on your skill level, this task could take 10 to 30 minutes per domain. We recommend you create a master sheet at the beginning to keep track of the data sheets for each listed domain. This ensures tables aren’t confused or lost.
Date of collection | Domain | Link to data sheet |
2022-09-11 | astutenews |
When the tables are completed, it is time for the aggregated data analysis. This can be found in a separate as yet unpublished section. The backlinks in this process come from BuzzSumo, but you can use backlinks from other sources.
Step One: Data collection
Use the backlink search function from Buzzsumo to find links to each site in your list of domains. The domains chosen for this first round fit the criteria outlined in the classification section, an evidence-based categorization algorithm. The domain search returns websites that have backlinked to any URL in the domain in the past year.
We selected “export CSV,” and BuzzSumo exported the backlinks. The column headings in the sheet were as follows: title, URL, evergreen_score, total_engagement, total_facebook_engagement, twitter_shares, pinterest_shares, total_reddit_engagement, published_date, article_types, video, article_amplifiers, author_name, num_words, thumbnail, facebook_comments, facebook_shares, facebook_likes, num_linking_domains, wow_count, love_count, haha_count, haha_count, sad_count, and angry_count.
Step Two: Data cleaning and analysis
For this analysis, we opened the data files in Google Sheets. The formula is written for Google Sheets and may not work in other programs. Once the data loads, add a column between columns “URL” and “evergreen_score,” also known as Columns B and C. Highlight Column C, right-click, and select “insert 1 column left.”
In the newly inserted column that now sits in the C column space, insert the formula:
=REGEXREPLACE(REGEXREPLACE(B:B,"(http(s)?://)?(www.)?",""),"/.*","")
The listed formula extracts the domain from each backlinking URL—press enter. Google Sheets should offer to auto-fill the column. Accept autofill so that the domain name listed for all URLs is extracted.
If the autofill option does not appear after entering the formula and pressing enter, use the mouse to drag the blue dot in the lower right-hand corner of the selected cell. Click the blue dot, hold the click, and drag downward until you have dragged the mouse through all cells that need to contain the formula.
Step Three: Pivot table
Next, click “insert.” Then, select “Pivot Table” from the menu. A new page for the pivot table has been created. Switch to the new pivot table sheet to calculate counts for each domain found in the URLs from the backlink data file. Name the new column with the domains in it “domain.”
Select the “domain” column. This column should be in column C's place. Right-click and select “View more column actions” and select “remove links.” Leave the column selected. Click “insert” and select “Pivot Table.” Now, click “Create.”
Under “Values summarizing,” the pivot table sheet opens automatically. The only data column available is “domain” since that is all we selected to create the pivot table. Drag and drop the domain under “Rows and Values.” Under “Values,” select “COUNTA” for rows under “Summarize by selection. Change the “Sort by” selection to “COUNTA of domain,” and order the column by “Descending.”
At this point, the sheet shows the domains backlinked to the domain we searched in BuzzSumo. Each domain has a number in the column next to it that reflects the number of times it has backlinked to the searched domain over the past year.
Step four: Create nodes and weighted edges
We turned this data into nodes and weighted edges by adding a column to the left of column A and filling it with the domain of interest for that sheet.
The three columns, which will correspond to new columns labeled “target,” “source,” and “weight,” are pasted into a new sheet labeled “primary edge.” This sheet will be used to create the data visualization.
A link to the backlink data should be saved with text about the domain so that this data is easily accessible at the end of the study. This process is repeated for every domain of interest until all of them have been entered.
Quite a few of the domains backlinking to astutenews.com are known for being a part of the pro-Kremlin ecosystem, like Veterans Today, which has a partnership with an outlet controlled by Russian intelligence; Voice of East, which publishes work from RT and Sputnik authors; and Modern Diplomacy EU, which Hoaxlines categorizes as a pro-Kremlin interface website.
We also see One World Press, which has a domain registered to InfoRos, a website run by the GRU. Global Research, the UNZ Review, Reseau International, and others that don't appear in the screenshot suggest that this website plays a role in the pro-Kremlin information ecosystem.