如何快捷的收集活动帖子做成汇总?

来源: nearby 2022-04-10 08:26:38 [] [博客] [旧帖] [给我悄悄话] 本文已被阅读: 次 (18525 bytes)
本文内容已被 [ nearby ] 在 2022-04-10 21:18:10 编辑过。如有问题,请报告版主或论坛管理删除.

美语世界的妖妖灵 (她是版主么?)来询问邻兄如何汇总活动帖子的。邻兄于是做了两件事:

  1. 要求她称呼邻兄'虎哥' (she did)。一不做二不休,服务大家,于是把这个程序公开于此。邻兄是Java和Excel的绝世高手 (别来请教我,我确实没时间解答问题),但邻兄不是Python的高手,才学Python,是故现在做啥都写Python,以熟悉之。
  2. 为她把邻兄的Python 程序加了许多说明
 
You can copy/paste the codes below into a Python program.  If you have Python 3 installed on your computer, you can then follow the prompted instructions to make 活动帖子的收集基本全自动.  
 
Good Luck! 拒不解答后续问题!
 
 
# Author: 书香之家版主 nearby, March 2022
#
# Usage of this Python program:
# 0. Make sure that you have Internet access and Python 3 installed on your computer (or use Cloud)!
# 1. Place this file in a folder. Say, in a folder named "wxc"
# 2. Go to your '论坛', search for your '活动' title. You will get one or more pages. Remember how many pages there are.
#       If you do not know how to do this, just skip this step, I will then assume that there are 3 pages (150 entries, which is more than usual)
# 3. execute this program, you will be prompted (asked for) the name of your activity, and
#    the number of pages you obtained in step 2 (if you do not know the number of pages, just hit ENTER)
#    Example:
#               春天的畅想
#               3 (or Hit ENTER key)
# 4. You will also be prompted for your 论坛's name in alphabets/English. You can look up this in your 论坛.
#    For example, 书香之家 has the URL https://bbs.wenxuecity.com/sxsj/, so its English name is sxsj.
#    Other examples include: 美语世界 is mysj, 文化走廊 is culture, 诗词欣赏 is poetry, etc.
# 5. The result is stored inside 'wxc/sxzj-out.html'. You can then copy/paste the source code of 'sxzj-out.html' into your WXC new page. Done!
#
#
# Note: By default the entries are organized in reverse chronological order.
# Should you need them to be placed in chronological order, please do:
# Comment out the statement: mylist.reverse() by placing # in front of it, like: #mylist.reverse()
#
#

import requests


notargets = ['跟帖', '输入关键词', '内容查询', 'input name', '当前', '首页', '上一页', '尾页', '下一页']
notargets.append('archive')
# This is how SXZJ (书香之家) works. When 无忧 starts an activity, she always marks her activity like this.
notargets.append('##活动##')
# notargets.append('汇总')


def isInside(line, notargets_array):
    for t in notargets_array:
        if t in line:
            return True
    return False
# END

# the line looks like <a href="/sxsj/76799.html" target="_blank"><em>春天的畅想</em>】春天属于女人</a>
# I need it to be <a href="https://bbs.wenxuecity.com/sxsj/76799.html" target="_blank"><em>春天的畅想</em>】春天属于女人</a>
def addHttp(line):
    at = line.split('href="')
    line2 = '<a href="https://bbs.wenxuecity.com' + at[1]
    return line2
# END

def processOneFile(target, html, mylist):
    # split the text by newline character to get an array of string
    all = html.text.split('\n')
    length = len(all)
    i = 0
    while i < length:
        line = all[i]
        if (target in line) and (not isInside(line, notargets)):
            line = addHttp(line)
            print(line)
            i = i + 1
            line2 = all[i]
            # look like: [书香之家] - <strong>WXCTEATIME</strong>(6987 bytes ), need to be WXCTEATIME only
            line2 = line2.replace('</strong>', '<strong>').split('<strong>')[1]
            i = i + 1
            line3 = all[i]
            line += "  " + line2 + "  " + line3
            mylist.append(line)
        i = i + 1
# END of FUNCTIONS


# ---- main starts here ----

print()
print('# Author: 书香之家版主 nearby, March 2022')
print()

target = input('What is the title of your activity (活动)?:  ')
pages = 3 # default, means there are maximum 150 entries
temp = input('How many pages there are when you search for the activity in WXC? (If you do not know, just Hit ENTER): ')
if temp != '':
    pages = int(temp)

subid = 'sxsj'
temp = input('What is the name of your 论坛 in English? For example, 书香之家 is sxsj, 美语世界 is mysj, 文化走廊 is culture, 诗词欣赏 is poetry: ')
if len(temp) >= 2:
    subid = temp

mylist = []
# this is the output file.
html2 = open('sxzj-out.html', 'w', encoding='utf-8')

url = 'https://bbs.wenxuecity.com/bbs/archive.php?SubID='+subid+'&pos=bbs&keyword=' + target + '&username='

f = requests.get(url)
processOneFile(target, f, mylist)
for i in range(1, pages):
    url = 'https://bbs.wenxuecity.com/bbs/archive.php?page=' + str(i) + '&SubID=' + subid +'&pos=bbs&keyword=' + target + '&username='
    f = requests.get(url)
    processOneFile(target, f, mylist)

mylist.reverse()
for li in mylist:
    html2.write('<p>' + li+'\n')
html2.close()

print("\n")
print(str(len(mylist)) + " entries")
print("\n")
print("Please check the file sxzj-out.html. The result is in it! Thanks for using this program. ---- 虎哥 / Nearby / 邻兄")

所有跟帖: 

赞! -WXCTEATIME- 给 WXCTEATIME 发送悄悄话 WXCTEATIME 的博客首页 (0 bytes) () 04/10/2022 postreply 08:34:44

赞! -可能成功的P- 给 可能成功的P 发送悄悄话 可能成功的P 的博客首页 (0 bytes) () 04/10/2022 postreply 08:41:00

赞邻兄,分享的精神可嘉。。。也赞邻兄的智慧,比如坚决不说话。。。:) -尘凡无忧- 给 尘凡无忧 发送悄悄话 尘凡无忧 的博客首页 (0 bytes) () 04/10/2022 postreply 08:48:48

对了,妖妖灵是美语坛版主。:)还有,被绝世高手四个字震晕了。。。。LOL -尘凡无忧- 给 尘凡无忧 发送悄悄话 尘凡无忧 的博客首页 (0 bytes) () 04/10/2022 postreply 08:53:13

好奇问一句,但是版主名字里没看见她。她是谁? :-) 自吹自吹,牛皮就是靠吹的 :-) -nearby- 给 nearby 发送悄悄话 nearby 的博客首页 (424 bytes) () 04/10/2022 postreply 08:57:38

LOL赞这吹力。。。。:) -尘凡无忧- 给 尘凡无忧 发送悄悄话 尘凡无忧 的博客首页 (0 bytes) () 04/10/2022 postreply 09:05:00

她是版主,人可以有好多件衣服,对吧?:) -WXCTEATIME- 给 WXCTEATIME 发送悄悄话 WXCTEATIME 的博客首页 (0 bytes) () 04/10/2022 postreply 09:06:21

程序一但启动,只消输入活动名称,一切搞定。邻兄就不回贴了哈。谢谢书香的朋友们 (及楼上的茶兄、小p、忧忧) -nearby- 给 nearby 发送悄悄话 nearby 的博客首页 (0 bytes) () 04/10/2022 postreply 08:55:52

赞美! -lovecat08- 给 lovecat08 发送悄悄话 lovecat08 的博客首页 (0 bytes) () 04/10/2022 postreply 08:57:07

发程序时连个manual 都不顺便写一个,其实够歹毒的 -kirn- 给 kirn 发送悄悄话 kirn 的博客首页 (0 bytes) () 04/10/2022 postreply 09:08:24

不得不批评小k,程序里一半都是 manual, 解释了两遍该如何用 -nearby- 给 nearby 发送悄悄话 nearby 的博客首页 (177 bytes) () 04/10/2022 postreply 09:11:00

作为一个用过类似简单大蛇程序的过来人,我可以很可怜的告诉你,我是被文件名等等搞昏的。除非经常用,否则转眼就忘。。连哪个 -kirn- 给 kirn 发送悄悄话 kirn 的博客首页 (95 bytes) () 04/10/2022 postreply 09:27:02

其实这个呢,懂的人一眼就懂了,不懂的话要补的课太多。。。邻兄也是无偿分享啊,这个工作应当是文学城技术部门来做的。。。 -尘凡无忧- 给 尘凡无忧 发送悄悄话 尘凡无忧 的博客首页 (0 bytes) () 04/10/2022 postreply 09:11:35

有技术部吗。我以为主要是营销部呢。。。志愿者倒是个个技艺惊人 -kirn- 给 kirn 发送悄悄话 kirn 的博客首页 (0 bytes) () 04/10/2022 postreply 09:28:49

有的。:) -尘凡无忧- 给 尘凡无忧 发送悄悄话 尘凡无忧 的博客首页 (0 bytes) () 04/10/2022 postreply 10:39:51

不想做版主的邻兄就不是好猫咪。。我绕道。:) -鲁冰花- 给 鲁冰花 发送悄悄话 鲁冰花 的博客首页 (0 bytes) () 04/10/2022 postreply 09:28:50

哇哇哇,虎哥真是活雷锋!!! 太感谢啦!!! 赶紧抱回家去好好琢磨!!! -妖妖灵- 给 妖妖灵 发送悄悄话 妖妖灵 的博客首页 (936 bytes) () 04/10/2022 postreply 11:08:24

希望这个能帮上妖妹。虎哥拿四个论坛,特别是你的和你的活动试过,都行。我第一次汇集活动也是手动,累晕 :-) -nearby- 给 nearby 发送悄悄话 nearby 的博客首页 (0 bytes) () 04/10/2022 postreply 12:49:12

赞邻版,文采和高科技俱佳。 -庄文雅- 给 庄文雅 发送悄悄话 庄文雅 的博客首页 (0 bytes) () 04/10/2022 postreply 11:29:27

真才华! 我是暈了,绕行… :~) -老林子里的夏天- 给 老林子里的夏天 发送悄悄话 老林子里的夏天 的博客首页 (0 bytes) () 04/10/2022 postreply 11:39:13

邻兄示范了给论坛搞些技术革新其实并不难,我曾建议多次论坛试点不显跟贴但有跟贴就自动上升,并不很难的,近兄应当文城技术顾问 -老键- 给 老键 发送悄悄话 老键 的博客首页 (0 bytes) () 04/10/2022 postreply 11:39:24

老键快来参加活动吧。。。:) -尘凡无忧- 给 尘凡无忧 发送悄悄话 尘凡无忧 的博客首页 (0 bytes) () 04/10/2022 postreply 13:53:52

啊没注意你们在搞活动,比赛编程? Python我还可以 -老键- 给 老键 发送悄悄话 老键 的博客首页 (0 bytes) () 04/10/2022 postreply 14:31:39

哈哈。是人间情色活动。我看过你的情色。。。。LOL -尘凡无忧- 给 尘凡无忧 发送悄悄话 尘凡无忧 的博客首页 (0 bytes) () 04/10/2022 postreply 15:52:21

忘了说,邻兄请网管把这个帖子放到论坛右边挂着收藏起来吧。。。 -尘凡无忧- 给 尘凡无忧 发送悄悄话 尘凡无忧 的博客首页 (0 bytes) () 04/10/2022 postreply 14:13:27

不明觉厉,邻兄威武! -浮云驰- 给 浮云驰 发送悄悄话 浮云驰 的博客首页 (0 bytes) () 04/10/2022 postreply 14:55:55

赞邻兄爱心满满! -applebee3- 给 applebee3 发送悄悄话 applebee3 的博客首页 (0 bytes) () 04/10/2022 postreply 15:34:09

请您先登陆,再发跟帖!

发现Adblock插件

如要继续浏览
请支持本站 请务必在本站关闭/移除任何Adblock

关闭Adblock后 请点击

请参考如何关闭Adblock/Adblock plus

安装Adblock plus用户请点击浏览器图标
选择“Disable on www.wenxuecity.com”

安装Adblock用户请点击图标
选择“don't run on pages on this domain”