无需首先呈现页面,因为您可以以json格式获取数据。棘手的部分是json格式是如何呈现表(带有td标签和colspan标签等)。因此,需要做一点点的工作来遍历整个过程,但并非不可能:
import requests
import pandas as pd
url = 'https://www.hkex.com.hk/eng/csm/ws/Highlightsearch.asmx/GetData'
payload = {
'LangCode': 'en','TDD': '1','TMM': '11','TYYYY': '2019'}
jsonData = requests.get(url,params=payload).json()
final_df = pd.DataFrame()
for row in jsonData['data']:
#row = jsonData['data'][1]
data_row = []
for idx,colspan in enumerate(row['colspan']):
colspan_int = int(colspan[0])
data_row.append(row['td'][idx] * colspan_int)
flat_list = [item for sublist in data_row for item in sublist]
temp_row = pd.DataFrame([flat_list])
final_df = final_df.append(temp_row,sort=True).reset_index(drop=True)
df = final_df[final_df[0].str.contains(r'Total market
capitalisation(?!$)')].iloc[:,:2]
df['date'] = date
df.to_csv('file.csv',index=False)
输出:
print (final_df.to_string())
0 1 2 3 4 5 6
0 Hong Kong <br>Exchange (01/11/2019 ) Hong Kong <br>Exchange (01/11/2019 ) Shanghai Stock<br>Exchange (01/11/2019 ) Shanghai Stock<br>Exchange (01/11/2019 ) Shenzhen Stock<br>Exchange (01/11/2019 ) Shenzhen Stock<br>Exchange (01/11/2019 )
1 Main Board GEM A Share B Share A Share B Share
2 No. of listed companies 2,031 383 1,488 50 2,178 47
3 No. of listed H shares 256 22 n.a. n.a. n.a. n.a.
4 No. of listed red-chips stocks 170 5 n.a. n.a. n.a. n.a.
5 Total no. of listed securities 12,573 384 n.a. n.a. n.a. n.a.
6 Total market capitalisation<br>(Bil. dollars) HKD 31,956 HKD 109 RMB 32,945 RMB 81 RMB 22,237 RMB 50
7 Total negotiable <br>capitalisation (Bil. doll... n.a. n.a. RMB 28,756 RMB 81 RMB 16,938 RMB 49
8 Average P/E ratio (Times) 11.16 19.76 13.90 9.18 24.70 9.55
9 Total turnover <br>(Mil. shares) 196,082 560 15,881 15 22,655 14
10 Total turnover <br>(Mil. dollars) HKD 79,397 HKD 160 RMB 169,934 RMB 85 RMB 260,208 RMB 57
11 Total market turnover<br>(Mil. dollars) HKD 79,557 HKD 79,557 RMB 176,232 RMB 176,232 RMB 260,264 RMB 260,264
本文链接:https://www.f2er.com/3162719.html