可以搜寻活动主帖，也能搜寻username所发主帖的程序

来源: nearby 于 2022-04-25 14:26:58 [档案] [博客] [旧帖] [给我悄悄话] 阅读数 : (23939 bytes)

本帖于 2022-06-07 04:30:52 时间, 由普通用户 nearby 编辑

邻兄今天抽空把上次的程序更新了一下，这样可以用这python程序来搜寻一个用户在一个论坛的所有主帖。比如用尘凡无忧再书香搜寻，可以找到１６９个主帖。邻兄只有１０８个主帖。

为什么要加这个功能呢？因为今天我想查看一个自己以前的帖子，可是我已经忘了帖子的名称！好难找的，干脆给程序加这个功能！

# Author: 书香之家版主 nearby, March 2022
#
# This program allows you to
# 1. Collect your activity posts in a 论坛
# 2. Search for the 主帖 posted based on username
#
# Usage of this Python program:
# 0. Make sure that you have Internet access and Python 3 installed on your computer (or use Cloud)!
# 1. Place this file in a folder. Say, in a folder named "wxc"
# 2. Follow the instructions prompted by the program, everything should work fine.
# 3. The result is stored inside 'wxc/sxzj-out.html'. You can then copy/paste the source code of 'sxzj-out.html' into your WXC new page. Done!
#
# Note
# 1. You will also be prompted for your 论坛's name in alphabets/English. You can look up this in your 论坛.
#    For example, 书香之家 has the URL https://bbs.wenxuecity.com/sxsj/, so its English name is sxsj.
#    Other examples include: 美语世界 is mysj, 文化走廊 is culture, 诗词欣赏 is poetry, etc.
# 2. By default the entries are organized in reverse chronological order.
# Should you need them to be placed in chronological order, please do:
# Comment out the statement: mylist.reverse() by placing # in front of it, like: #mylist.reverse()
#
#

import requests


notargets = ['跟帖', '输入关键词', '内容查询', 'input name', '当前', '首页', '上一页', '尾页', '下一页']
notargets.append('archive')
# This is how SXZJ (书香之家) works. When 尘凡无忧 starts an activity, she always marks her activity like this.
notargets.append('##活动##')
# notargets.append('汇总')


def isInside(line, notargets_array):
    for t in notargets_array:
        if t in line:
            return True
    return False
# END

# the line looks like <a href="/sxsj/76799.html" target="_blank">【<em>春天的畅想</em>】春天属于女人</a>
# I need it to be <a href="https://bbs.wenxuecity.com/sxsj/76799.html" target="_blank">【<em>春天的畅想</em>】春天属于女人</a>
def addHttp(line):
    at = line.split('href="')
    line2 = '<a href="https://bbs.wenxuecity.com' + at[1]
    return line2
# END

def processOneFile(target, html, mylist, searchedURL=None):
    # split the text by newline character to get an array of string
    all = html.text.split('\n')
    length = len(all)
    i = 0
    if searchedURL == None:
        while i < length:
            line = all[i]
            if (target in line) and (not isInside(line, notargets)):
                line = addHttp(line)
                print(line)
                i = i + 1
                line2 = all[i]
                # look like: [书香之家] - <strong>WXCTEATIME</strong>(6987 bytes ), need to be WXCTEATIME only
                line2 = line2.replace('</strong>', '<strong>').split('<strong>')[1]
                i = i + 1
                line3 = all[i]
                line += "  " + line2 + "  " + line3
                mylist.append(line)
            i = i + 1
    else:
        searchedUsername = '<strong><em>' + target + '</em></strong>'
        while i < length:
            line = all[i]
            if (searchedURL in line) and ('target="_blank"' in line) and (not isInside(line, notargets)):
                i = i + 1
                line2 = all[i]
                # look like: <strong><em>nearby</em></strong>,
                if searchedUsername in line2:
                    line = addHttp(line)
                    # add date information on 6-7-2022, which looks like:    <i>2022-04-24</i>
                    i = i + 1
                    line = line + ' ' + all[i]
                    mylist.append(line)
                    print(line)
            i = i + 1
# END of FUNCTIONS


# ---- main starts here ----

print()
print('# Author: 书香之家版主 nearby, March 2022')
print()

use = 0 # 0 = activity, 1 = username
try:
    use = int(input("Is your search based on activity name or username? If username, enter 1, otherwise enter 0: "))
except:
    print('Wrong input. Assume you are searching for activity posts. use=0')
    use = 0
#
if use == 1:
    target = input('What is the username (for example: nearby or 尘凡无忧)?:  ')
    if target == '':
        target = 'nearby'
    else:
        target = target.lstrip().rstrip()
    pages = 100  # default
    temp = input('How many pages there are when you search for the username in WXC? (If you do not know, just Hit ENTER, default 100 is assumed): ')
    if temp != '':
        pages = int(temp)
else:
    target = input('What is the title of your activity (活动)?:  ')
    target = target.lstrip().rstrip()
    pages = 3 # default, means there are maximum 150 entries
    temp = input('How many pages there are when you search for the activity in WXC? (If you do not know, just Hit ENTER): ')
    if temp != '':
        pages = int(temp)

subid = 'sxsj'
temp = input('What is the name of your 论坛 in English? For example, 书香之家 is sxsj, 美语世界 is mysj, 文化走廊 is culture, 诗词欣赏 is poetry: ')
if len(temp) >= 2:
    subid = temp

mylist = []
# this is the output file.
html2 = open('sxzj-out.html', 'w', encoding='utf-8')

useron = ''
if use == 1:
    useron = 'on'

url = 'https://bbs.wenxuecity.com/bbs/archive.php?SubID='+subid+'&pos=bbs&keyword=' + target + '&username=' + useron

f = requests.get(url)
if use == 1:
    searchedURL = '<a href="/' + subid + '/'
    processOneFile(target, f, mylist, searchedURL)
else:
    processOneFile(target, f, mylist)
for i in range(1, pages):
    url = 'https://bbs.wenxuecity.com/bbs/archive.php?page=' + str(i) + '&SubID=' + subid +'&pos=bbs&keyword=' + target + '&username=' + useron
    f = requests.get(url)
    if use == 1:
        searchedURL = '<a href="/' + subid + '/'
        processOneFile(target, f, mylist, searchedURL)
    else:
        processOneFile(target, f, mylist)

if use != 1:
    mylist.reverse()

for li in mylist:
    html2.write('<p>' + li+'\n')
html2.close()

print("\n")
print(str(len(mylist)) + " entries")
print("\n")
print("Please check the file sxzj-out.html. The result is in it! Thanks for using this program. ---- 虎哥 / Nearby / 邻兄")

您的位置：文学城 » 论坛 » 书香之家 » 可以搜寻活动主帖，也能搜寻username所发主帖的程序

所有跟帖：

• 小白路过 -望沙- ♀ (0 bytes) () 04/25/2022 postreply 14:47:48

• 需要时与师兄说一声，师兄为你搜寻就是 -nearby- ♂ (0 bytes) () 04/25/2022 postreply 15:00:44

• 有点难学。。。。。 -lovecat08- ♀ (0 bytes) () 04/25/2022 postreply 16:23:18

• Python比较容易 -nearby- ♂ (0 bytes) () 04/25/2022 postreply 16:53:13

• 赞邻兄的程序，写得跟诗一样。。 -东风再起- ♂ (0 bytes) () 04/25/2022 postreply 16:27:39

• 看上去像：-） -nearby- ♂ (0 bytes) () 04/25/2022 postreply 16:50:00

• 赞！ -WXCTEATIME- ♂ (0 bytes) () 04/25/2022 postreply 17:23:25

• 赞有心！ -applebee3- ♀ (0 bytes) () 04/25/2022 postreply 17:40:17

• 佩服！然而只能不明觉厉：） -浮云驰- ♀ (0 bytes) () 04/26/2022 postreply 02:07:28

• 高手！ -laopika- ♂ (0 bytes) () 04/26/2022 postreply 10:17:44

请您先登陆，再发跟帖！