Back
Featured image of post 从pinbox到notion再到notion api

从pinbox到notion再到notion api

笔记本软件的创造是为了折腾人类

两年前,为了便于浏览各个网站的收藏文章,于是把它们都整合到pinbox这个软件上,然而整合完自己都忘了。直到最近又有收藏文章的需求,打开它一瞧,除了多了个收费外基本没什么变化,后来发现用邀请码可以创建多级收藏夹,心安理得的继续白嫖,没过多久,估计白嫖多级收藏夹的事被知道了,明明没有达到收藏上限,却再也无法创建新收藏了,虽然一年会员费也不贵,但pinbox略带简陋的界面以及两年来几乎没变化的功能,请容许我拒绝

pinbox界面

pinbox界面

开始了寻找替代pinbox软件,起初想自部署笔记软件,但功能太少了,自己又没有服务器,想想还是算了,后来找到了notion,虽然是笔记软件,但完美的契合了要求,notion介绍功能网上很多,懒得说了,感觉这款软件最大的亮点在于白嫖模块化

pinbox笑我贫穷,我笑它不懂死宅

pinbox笑我贫穷,我笑它不懂死宅


notion api

事情到这里,一般就结束了,无非是换了个软件罢了,但某天躺平在床上刷着豆瓣,偶然发现了notion原来有api的,垂死病中惊坐起,在用过notion功能后,一直想把花瓣网的图片和网易云的歌单导入进去,在看到有api后,开启了折腾之旅

notion api可以结合python使用,python以前从没写过,后来看了下有点像node.js爬虫,仰仗贫瘠的js知识与捉急的智商,抄袭借鉴Notion → 支付宝&微信 → 账单里的代码,头发少了几根后,恼恨愉悦开启爬虫之旅

然而一开始就有问题了,写代码常有的事,输入pip install requests解决

接着又发现notion api怪得很,用图片链接一定要求有后缀,而花瓣网图片恰恰是链接显示的,还能不能愉快地玩耍

后来思考notion支持导入markdown文件,那先把图片链接保存为md文件,在导入到notion中,再根据api更新里面链接,测试可行后,就先开始爬虫花瓣网

花瓣网

花瓣网虽然有tag功能,但却没有排除关键字搜索,这样找起图片来诸多不易,

花瓣谜一般的搜索功能

花瓣谜一般的搜索功能

notion的筛选

notion的筛选

1.图片链接与信息汇总

简单来说就是为每一个图片链接保存为md文件,再把图片中的tag,花瓣链接,源地址保存到汇总.csv中

代码
import os
import requests
import re

headers={
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36',
        'X-Request': 'JSON',
    "cookie":"cookie"
}
req = requests.get(url = "画板链接",headers=headers)
htmlPage = req.content

        
for p in range(1,25):
  prog = re.compile(r'"pins".*')
  appPins = prog.findall(htmlPage.decode("utf-8"))

  null = None       
  true = True
  result = eval(appPins[0][7:-2])
  images = []
  
  for i in result:
     
    info = "![]("+"https://hbimg.huabanimg.com/" + str(i["file"]["key"])+"_fw236/format/webp)"+"\n"+"![]("+"https://hbimg.huabanimg.com/" + str(i["file"]["key"])+")"
    
    with open("E:/office/py/爬虫/"+ str(i['pin_id']) +".md", "a",encoding="utf-8", newline="")as f:
        f.write(str(info)+"\n")
    tagnull= str(i["tags"])
    if tagnull.count(",")>4:
      tagnull= tagnull[0:24]
    while tagnull.count(",") < 4:
     tagnull=  tagnull+",null"
    with open("E:/office/py/爬虫/花瓣汇总.csv", "a",encoding="utf-8", newline="")as fo:
         linknull=str(i["link"])
         if linknull== "":
          linknull="None"
         fo.write(str(i['pin_id']) +","+ tagnull.replace("[]", "null").replace("[", "").replace("]", "").replace("'", "")+","+ "https://huaban.com/pins/"+ str(i['pin_id'])+","+linknull+"\n")
    images.append(i['pin_id'])
    
    htmlPage = requests.get(url = "加载链接前缀" + str(images[-1]) + "&limit=20&wfl=1",headers=headers).content
值得注意的是瀑布流加载链接的地址

写完,接下来是

如无意外文件如下

md文件

md文件

汇总文件内容

汇总文件内容

md文件内容

md文件内容

2.信息处理

md导入notion

md导入notion

将md导入到notion,需要获取每张图的page-id,这也是notion api奇怪的原因,无法从api获取到所有page-id,也不用爬虫,直接写点js了事

 function notion(){
    a=0
  while (a <206){
   var links= document.getElementsByClassName('notion-selectable notion-page-block notion-collection-item')[a].firstElementChild.href.slice(41)
    a+=1
    console.log(links) 
  }
  
}

将page-id 与汇总进行匹配

代码
import csv




filepath = "E:/office/py/date/花瓣汇总.csv"
filepath2 = "E:/office/py/date/notionlink.csv"

names1=[]
tags=[]
names2=[]
links=[]
fo = open("E:/office/py/date/test.txt", "w",encoding="utf-8")

with open(filepath, "r", encoding="utf-8", newline="") as f:
    csvreader = csv.reader(f)
    for row1 in csvreader:
     names1.append(row1[0])
     tags.append(row1[1] + ","+row1[2]+ ","+row1[3]+ ","+row1[4]+ ","+row1[5]+ ","+row1[6]+ ","+row1[7])
with open(filepath2, "r", encoding="utf-8", newline="") as f:
    csvreader1 = csv.reader(f)
    for row2 in csvreader1:
      names2.append(row2[0])
      links.append(row2[1])

a=0
while a < 数目:
    pageid= names2.index(names1[a])
    seq=[links[pageid],tags[a]]
    fo.write(str(seq).replace("[", "").replace("]", "").replace("'", "")+"\n")
    a+=1

匹配过后

3.更新页面

绕了半天,终于能根据路径.csv直接更新页面

代码
import requests
import csv


filepath2 = "E:/office/py/date/路径.csv"
pageid=[]
tag1=[]
tag2=[]
tag3=[]
tag4=[]
tag5=[]
links=[]
link2=[]
a=0
with open(filepath2, "r", encoding="utf-8", newline="") as f:
    csvreader1 = csv.reader(f)
    for row2 in csvreader1:
      tag1.append(row2[1])
      tag2.append(row2[2])
      tag3.append(row2[3])
      tag4.append(row2[4])
      tag5.append(row2[5])
      pageid.append(row2[0])
      links.append(row2[6])
      link2.append(row2[7])
class notionDemo():
    
    def add_bill(a):
        body = {
     
        "properties": {
           
            "Tags": {"multi_select": [{
                    "name":tag1[a],
                },
                {
                    "name": tag2[a],
                    },
                {
                    "name": tag3[a],
                    },
                {
                    "name": tag4[a],
                    },
                {
                    "name": tag5[a],
                    },
               ]},
            "link": {
			     "url": links[a],
		        },  
            "源站": {
			     "url": link2[a],
		        },  
             },
       
    }
    
    
      
        
        r = requests.request(
            "Patch",
            "https://api.notion.com/v1/pages/"+pageid[a],
            json=body,
            headers={"Authorization": "Bearer " + "自己token", "Notion-Version": "2021-05-13"},
            )
        print(r.text)

a = 0
while a < 数目:
    notionDemo.add_bill(a)
    a+=1

写完后,还是熟悉的

~

更新后的亚子

更新后的亚子

网易云音乐

网抑云歌单不支持专辑封面浏览就算了,为每首歌添加标签也不支持,对我这种歌单基本是英语和日语,找起歌来只能一首一首听

1.信息汇总

封面图片链接有后缀,这样直接就能添加了,不过网易云官方api反爬虫严重,还是自部署一个,github地址,

代码
import os
from sys import argv
import requests
import re
import json

headers={
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36',
        'X-Request': 'JSON',
    "cookie":"自己的cookies"
}

req = requests.get(url = "https://自部署api/playlist/track/all?id=歌单链接",headers=headers).content.decode('utf-8')


response_dict=json.loads(req)

song=response_dict["songs"]

for i in song:

 with open("E:/office/py/爬虫/网易云/我喜欢的音乐.csv", "a",encoding="utf-8", newline="")as fo:
   # 这里面有个中括号不知道怎么对付,应该可以写成i["ar"][0]["name"]
   ar=str(i['ar']).replace("[{'id': ","").replace("'name':","").replace("''':","").replace(" 'tns': [], 'alias': []}]","").replace("'","").replace(" ","")
   
   fo.write(str(i["name"]).replace(",","-")+","+"https://music.163.com/#/song?id="+str(i["id"])+","+str(i['al']['name']).replace(",","-")+","+"https://music.163.com/#/album?id="+str(i['al']['id'])+","+str(i['al']['picUrl'])+","+ar+"\n")
接下来又是熟悉的

获取到要添加内容

2.添加到notion中

值得注意的是要先创建模板,比如要添加专辑这一内容,那先要再notion创建专辑

代码
from os import path
import requests
import csv
import arrow

class notionDemo():
    def add_bill(a,b,c,d,e,f,g):
        body = {
        "parent": {"database_id": "自己的database"},
            "properties": {
            
            "Name": {"title": [{"type": "text", "text": {"content": b}}]},
            "专辑": {"type": "rich_text",	
                    "rich_text": [{
		                "type": "text",
				        "text": {
					      "content": c,
				       	"link": {"url": d}
				         },}]},
            "name": {"type": "rich_text",	
                    "rich_text": [{
		                "type": "text",
				        "text": {
					      "content": a,
				       	"link": {"url": b}
				         },}]},
            "歌手": {"type": "rich_text",	
                    "rich_text": [{
		                "type": "text",
				        "text": {
					      "content": g,
				       	"link": {"url": "https://music.163.com/#/artist?id="+f}
				         },}]},
            
            "封面": {
			
			"type": "files",
			"files": [{
				"name": e,
				"type": "external",
				"external": {
					"url": e
				}
			}]
		},
        },
        
    }
        r = requests.request(
            "POST",
            "https://api.notion.com/v1/pages",
            json=body,
            headers={"Authorization": "Bearer " + "自己token", "Notion-Version": "2021-05-13"},
            )
        print(r.text)

   

   
filepath="E:/office/py/爬虫/网易云/我喜欢的音乐.csv"

with open(filepath, "r", encoding="utf-8", newline="") as f:
    csvreader1 = csv.reader(f)
    for row2 in csvreader1:
        a=row2[0]
        b=row2[1]
        c=row2[2]
        d=row2[3]
        e=row2[4]
        f=row2[5]
        g=row2[6]
        notionDemo.add_bill(a,b,c,d,e,f,g)

接下来仍是熟悉的

添加后的界面


写到这里就该结束了,不过貌似可以跟github action结合,这样就可以实时更新页面,不过对我用处不大,一来网易云听歌少了,没什么新添加的歌曲,二来花瓣网图片链接无法直接添加页面,所以还是算了

十分感谢以下文章让我借鉴代码

豆瓣标记导出到 Notion 并同步

Notion → 支付宝&微信 → 账单

以全新的 Notion API,尝试全新的记账方式

Built with Hugo
Theme Stack designed by Jimmy