我有以下boto3
草稿脚本
#!/usr/bin/env python3
import boto3
client = boto3.client('athena')
BUCKETS='buckets.txt'
DATABASE='some_db'
QUERY_STR="""CREATE EXTERNAL TABLE IF NOT EXISTS some_db.{}(
BucketOwner STRING,Bucket STRING,RequestDateTime STRING,RemoteIP STRING,Requester STRING,RequestID STRING,Operation STRING,Key STRING,Requesturi_operation STRING,Requesturi_key STRING,Requesturi_httpProtoversion STRING,HTTPstatus STRING,ErrorCode STRING,BytesSent BIGINT,ObjectSize BIGINT,TotalTime STRING,TurnAroundTime STRING,Referrer STRING,UserAgent STRING,VersionId STRING,HostId STRING,SigV STRING,CipherSuite STRING,AuthType STRING,EndPoint STRING,TLSVersion STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1','input.regex' = '([^ ]*) ([^ ]*) \\[(.*?)\\] ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) \\\"([^ ]*) ([^ ]*) (- |[^ ]*)\\\" (-|[0-9]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\") ([^ ]*)(?: ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*))?.*$' )
LOCATION 's3://my-bucket/{}'"""
with open(BUCKETS,'r') as f:
lines = f.readlines()
for line in lines:
query_string = QUERY_STR.format(line,line)
response = client.create_named_query(
Name=line,Database=DATABASE,QueryString=QUERY_STR
)
print(response)
执行后,所有响应均返回状态代码200
。
为什么我看不到应该创建的对应表?
我应该(至少)不能看到那些存储在哪里的查询吗?
update1 :我现在正尝试通过上述查询实际创建表,如下所示:
for line in lines:
query_string = QUERY_STR.format(DATABASE,line[:-1].replace('-','_'),line[:-1])
try:
response1 = client.start_query_execution(
QueryString=query_string,WorkGroup=WORKGROUP,QueryExecutionContext={
'Database': DATABASE
},ResultConfiguration={
'OutputLocation': OUTPUT_BUCKET,},)
query_execution_id = response1['ResponseMetadata']['RequestId']
print(query_execution_id)
except Exception as e1:
print(query_string)
raise(e1)
再次,该脚本会输出一些查询ID(似乎没有错误),但是不会创建表。
我还遵循了@John Rotenstein的建议,并按如下方式初始化了我的boto3
客户:
client = boto3.client('athena',region_name='us-east-1')