Python: Sgriobt Earrann Google Autosuggest de ghluasadan airson na prìomh fhaclan rannsachaidh agad

Sgriobt Python gus na gluasadan fèin-ghluasadach a ghlacadh

Is toil le a h-uile duine Google Trends, ach tha e beagan duilich nuair a thig e gu Long Tail Keywords. Is toil leinn uile an t-oifigeach seirbheis gluasadan google airson beachdan fhaighinn air a ’ghiùlan sgrùdaidh. Ach, tha dà rud a ’cur casg air mòran bho bhith ga chleachdadh airson obair chruaidh;

  1. Nuair a dh ’fheumas tu lorg prìomh fhaclan sònraichte ùraan-sin, an sin chan eil dàta gu leòr ann air Google Trends 
  2. Dìth API oifigeil airson iarrtasan a dhèanamh gu gluasadan google: Nuair a bhios sinn a ’cleachdadh mhodalan mar pytrends, an uairsin feumaidh sinn frithealaichean proxy a chleachdadh, no gheibh sinn bacadh. 

San artaigil seo, roinnidh mi Sgriobt Python a sgrìobh sinn gus prìomh fhaclan gluasadach a thoirt a-mach tro Google Autosuggest.

Toraidhean Autosuggest Fetch and Store thar ùine 

Osbarr tha 1,000 prìomh fhaclan sìol againn ri chuir gu Google Autosuggest. Mar dhuais, is dòcha gum faigh sinn timcheall air 200,000 longtail prìomh fhaclan. An uairsin, feumaidh sinn an aon rud a dhèanamh seachdain às deidh sin agus coimeas a dhèanamh eadar na stòran-dàta sin gus dà cheist a fhreagairt:

  • Dè na ceistean a th ’ann prìomh fhaclan ùra an coimeas ris an uair mu dheireadh? Is dòcha gur e seo a ’chùis a dh’ fheumas sinn. Tha Google den bheachd gu bheil na ceistean sin a ’fàs nas cudromaiche - le bhith a’ dèanamh sin, is urrainn dhuinn ar fuasgladh Google Autosuggest fhèin a chruthachadh! 
  • Dè na ceistean a th ’ann prìomh fhaclan a-nis a ’gluasad?

Tha an sgriobt gu math furasta, agus a ’mhòr-chuid den chòd a roinn mi an seo. Bidh an còd ùraichte a ’sàbhaladh an dàta bho ruith roimhe agus a’ dèanamh coimeas eadar na molaidhean thar ùine. Sheachain sinn stòran-dàta stèidhichte air faidhlichean mar SQLite gus a dhèanamh sìmplidh - mar sin tha an stòradh dàta gu lèir a ’cleachdadh faidhlichean CSV gu h-ìosal. Leigidh seo dhut am faidhle a thoirt a-steach ann an Excel agus gluasadan prìomh fhacal sònraichte a sgrùdadh airson do ghnìomhachas.

Gus an sgriobt Python seo a chleachdadh

  1. Cuir a-steach an seata prìomh fhacal sìl agad a bu chòir a chuir chun autocomplete: allweddairs.csv
  2. Atharraich na roghainnean Sgriobt airson do fheum:
    • CÀNAN: àbhaisteach “en”
    • DÙTHCHAS: àbhaisteach “sinn”
  3. Clàraich an sgriobt gus ruith uair san t-seachdain. Faodaidh tu cuideachd a ruith le làimh mar a thogras tu.
  4. Cleachd eochair-fhacal_suggestions.csv airson tuilleadh anailis:
    • first_seen: is e seo an ceann-latha far an do nochd a ’cheist airson a’ chiad uair anns an autosuggest
    • last_seen: an ceann-latha far an deach a ’cheist fhaicinn airson an uair mu dheireadh
    • is_new: if first_seen == last_seen shuidhich sinn seo True - Dìreach criathraich air an luach seo gus na rannsachaidhean gluasadach ùra fhaighinn anns an Google autosuggest.

Seo an Còd Python

# Pemavor.com Autocomplete Trends
# Author: Stefan Neefischer (stefan.neefischer@gmail.com)
import concurrent.futures
from datetime import date
from datetime import datetime
import pandas as pd
import itertools
import requests
import string
import json
import time

charList = " " + string.ascii_lowercase + string.digits

def makeGoogleRequest(query):
    # If you make requests too quickly, you may be blocked by google 
    time.sleep(WAIT_TIME)
    URL="http://suggestqueries.google.com/complete/search"
    PARAMS = {"client":"opera",
            "hl":LANGUAGE,
            "q":query,
            "gl":COUNTRY}
    response = requests.get(URL, params=PARAMS)
    if response.status_code == 200:
        try:
            suggestedSearches = json.loads(response.content.decode('utf-8'))[1]
        except:
            suggestedSearches = json.loads(response.content.decode('latin-1'))[1]
        return suggestedSearches
    else:
        return "ERR"

def getGoogleSuggests(keyword):
    # err_count1 = 0
    queryList = [keyword + " " + char for char in charList]
    suggestions = []
    for query in queryList:
        suggestion = makeGoogleRequest(query)
        if suggestion != 'ERR':
            suggestions.append(suggestion)

    # Remove empty suggestions
    suggestions = set(itertools.chain(*suggestions))
    if "" in suggestions:
        suggestions.remove("")
    return suggestions

def autocomplete(csv_fileName):
    dateTimeObj = datetime.now().date()
    #read your csv file that contain keywords that you want to send to google autocomplete
    df = pd.read_csv(csv_fileName)
    keywords = df.iloc[:,0].tolist()
    resultList = []

    with concurrent.futures.ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        futuresGoogle = {executor.submit(getGoogleSuggests, keyword): keyword for keyword in keywords}

        for future in concurrent.futures.as_completed(futuresGoogle):
            key = futuresGoogle[future]
            for suggestion in future.result():
                resultList.append([key, suggestion])

    # Convert the results to a dataframe
    suggestion_new = pd.DataFrame(resultList, columns=['Keyword','Suggestion'])
    del resultList

    #if we have old results read them
    try:
        suggestion_df=pd.read_csv("keyword_suggestions.csv")
        
    except:
        suggestion_df=pd.DataFrame(columns=['first_seen','last_seen','Keyword','Suggestion'])
    
    suggestionCommon_list=[]
    suggestionNew_list=[]
    for keyword in suggestion_new["Keyword"].unique():
        new_df=suggestion_new[suggestion_new["Keyword"]==keyword]
        old_df=suggestion_df[suggestion_df["Keyword"]==keyword]
        newSuggestion=set(new_df["Suggestion"].to_list())
        oldSuggestion=set(old_df["Suggestion"].to_list())
        commonSuggestion=list(newSuggestion & oldSuggestion)
        new_Suggestion=list(newSuggestion - oldSuggestion)
         
        for suggest in commonSuggestion:
            suggestionCommon_list.append([dateTimeObj,keyword,suggest])
        for suggest in new_Suggestion:
            suggestionNew_list.append([dateTimeObj,dateTimeObj,keyword,suggest])
    
    #new keywords
    newSuggestion_df = pd.DataFrame(suggestionNew_list, columns=['first_seen','last_seen','Keyword','Suggestion'])
    #shared keywords with date update
    commonSuggestion_df = pd.DataFrame(suggestionCommon_list, columns=['last_seen','Keyword','Suggestion'])
    merge=pd.merge(suggestion_df, commonSuggestion_df, left_on=["Suggestion"], right_on=["Suggestion"], how='left')
    merge = merge.rename(columns={'last_seen_y': 'last_seen',"Keyword_x":"Keyword"})
    merge["last_seen"].fillna(merge["last_seen_x"], inplace=True)
    del merge["last_seen_x"]
    del merge["Keyword_y"]
    
    #merge old results with new results
    frames = [merge, newSuggestion_df]
    keywords_df =  pd.concat(frames, ignore_index=True, sort=False)
    # Save dataframe as a CSV file
    keywords_df['first_seen'] = pd.to_datetime(keywords_df['first_seen'])
    keywords_df = keywords_df.sort_values(by=['first_seen','Keyword'], ascending=[False,False])   
    keywords_df['first_seen']= pd.to_datetime(keywords_df['first_seen'])
    keywords_df['last_seen']= pd.to_datetime(keywords_df['last_seen'])
    keywords_df['is_new'] = (keywords_df['first_seen']== keywords_df['last_seen'])
    keywords_df=keywords_df[['first_seen','last_seen','Keyword','Suggestion','is_new']]
    keywords_df.to_csv('keyword_suggestions.csv', index=False)

# If you use more than 50 seed keywords you should slow down your requests - otherwise google is blocking the script
# If you have thousands of seed keywords use e.g. WAIT_TIME = 1 and MAX_WORKERS = 5
WAIT_TIME = 0.2
MAX_WORKERS = 20
# set the autocomplete language
LANGUAGE = "en"
# set the autocomplete country code - DE, US, TR, GR, etc..
COUNTRY="US"
# Keyword_seed csv file name. One column csv file.
#csv_fileName="keyword_seeds.csv"
CSV_FILE_NAME="keywords.csv"
autocomplete(CSV_FILE_NAME)
#The result will save in keyword_suggestions.csv csv file

Luchdaich sìos Sgriobt Python

Dè do bheachd?

Tha an làrach seo a 'cleachdadh Akismet gus spama a lùghdachadh. Ionnsaich mar a thathar a 'pròiseasadh an dàta bheachdan agad.