尝试删除时间和更改日期格式时解析错误?

我正在尝试删除社交媒体数据集中的时间并更改日期格式,以便在合并两个数据集时它与我的股票数据兼容。

这是我的社交媒体数据集样本:

0       id      created_at
1       1       7:51 PM ET Fri,17 July 2020
2       2       7:33 PM ET Fri,17 July 2020
4       4       7:25 PM ET Fri,17 July 2020
5       5       4:24 PM ET Fri,17 July 2020
…       …       …
3076    3076    10:15 AM ET Tue,26 Dec 2017
3077    3077    11:12 AM ET Thu,20 Sept 2018
3078    3078    7:07 PM ET Fri,22 Dec 2017
3079    3079    7:07 PM ET Fri,22 Dec 2017
3080    3080    6:52 PM ET Fri,22 Dec 2017

我试图让日期看起来像这样:

Date        Open    High
2017-12-22  2684.22 2685.35
2017-12-26  2679.09 2682.74
2017-12-27  2682.10 2685.64
2017-12-28  2686.10 2687.66
2017-12-29  2689.15 2692.12

这是我尝试过的,但没有奏效:

pd.to_datetime(data['created_at'])

但我得到错误:

 ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data,dayfirst,yearfirst,utc,errors,require_iso8601,allow_object)
   2053         try:
-> 2054             values,tz_parsed = conversion.datetime_to_datetime64(data)
   2055             # If tzaware,these values represent unix timestamps,so we

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.datetime_to_datetime64()

TypeError: Unrecognized value type: <class 'str'>

During handling of the above exception,another exception occurred:

ParserError                               Traceback (most recent call last)
<ipython-input-13-34e0ddb54ab0> in <module>
----> 1 pd.to_datetime(data['created_at'])

~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/tools/datetimes.py in to_datetime(arg,format,exact,unit,infer_datetime_format,origin,cache)
    801             result = arg.map(cache_array)
    802         else:
--> 803             values = convert_listlike(arg._values,format)
    804             result = arg._constructor(values,index=arg.index,name=arg.name)
    805     elif isinstance(arg,(ABCDataFrame,abc.MutableMapping)):

~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/tools/datetimes.py in _convert_listlike_datetimes(arg,name,tz,exact)
    457         assert format is None or infer_datetime_format
    458         utc = tz == "utc"
--> 459         result,tz_parsed = objects_to_datetime64ns(
    460             arg,461             dayfirst=dayfirst,~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data,allow_object)
   2057             return values.view("i8"),tz_parsed
   2058         except (ValueError,TypeError):
-> 2059             raise e
   2060 
   2061     if tz_parsed is not None:

~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data,allow_object)
   2042 
   2043     try:
-> 2044         result,tz_parsed = tslib.array_to_datetime(
   2045             data,2046             errors=errors,pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime_object()

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime_object()

pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string()

~/opt/anaconda3/lib/python3.8/site-packages/dateutil/parser/_parser.py in parse(timestr,parserinfo,**kwargs)
   1366         return parser(parserinfo).parse(timestr,**kwargs)
   1367     else:
-> 1368         return DEFAULTPARSER.parse(timestr,**kwargs)
   1369 
   1370 

~/opt/anaconda3/lib/python3.8/site-packages/dateutil/parser/_parser.py in parse(self,timestr,default,ignoretz,tzinfos,**kwargs)
    641 
    642         if res is None:
--> 643             raise ParserError("Unknown string format: %s",timestr)
    644 
    645         if len(res) == 0:

ParserError: Unknown string format: created_at 

谢谢你的帮助:)

编辑:Sample of dataset

fanhouyihui 回答:尝试删除时间和更改日期格式时解析错误?

, 上拆分并保留第二部分(日期)并使用 pd.to_datetime 将其转换为日期时间:

>>> pd.to_datetime(df['created_at'].str.split(',').str[1])
1      2020-07-17
2      2020-07-17
4      2020-07-17
5      2020-07-17
3076   2017-12-26
3077   2018-09-20
3078   2017-12-22
3079   2017-12-22
3080   2017-12-22
Name: created_at,dtype: datetime64[ns]

旧答案 您可以使用 dateutil 软件包(已与 pandas 一起安装):

from dateutil import parser

>>> df['created_at'].apply(parser.parse,tzinfos={'ET': -4*3600})

1      2020-07-17 19:51:00-04:00
2      2020-07-17 19:33:00-04:00
4      2020-07-17 19:25:00-04:00
5      2020-07-17 16:24:00-04:00
3076   2017-12-26 10:15:00-04:00
3077   2018-09-20 11:12:00-04:00
3078   2017-12-22 19:07:00-04:00
3079   2017-12-22 19:07:00-04:00
3080   2017-12-22 18:52:00-04:00
Name: created_at,dtype: datetime64[ns,tzoffset('ET',-14400)]

如果需要,您可以向 dict tzinfos 添加其他时区。

更新

解析器错误:未知字符串格式:created_at。

引发此异常是因为在 df['created_at'] 列中,您有一个值为“created_at”。例如:

>>> df
   id                    created_at
0   0                         hello  # <- it's not a valid datetime
1   1  7:51 PM ET Fri,17 July 2020
2   2  7:33 PM ET Fri,17 July 2020

>>> df['created_at'].apply(parser.parse,tzinfos={'ET': -4*3600})

---------------------------------------------------------------------------
ParserError                               Traceback (most recent call last)

...

ParserError: Unknown string format: hello  # 'hello' is not a valid datetime

要查找不正确,请搜索所有不包含“AM”或“PM”作为值的行:

>>> df.loc[~df['created_at'].str.contains(r'(?:AM|PM)'),'created_at']

1    hello
Name: created_at,dtype: object
本文链接:https://www.f2er.com/806.html

大家都在问