绘制两个.csv数据集的两个直方图,以比较R中的数据(ggplot)

我想在一张图中比较两个数据集(以太坊价格和交易量)。我画了一个图,但我认为y轴的比例有些错误:

ETH_price <- read.table(file = '~/R/export-EtherPrice.csv',header = T,sep=";")

transaction_volume <- read.csv(file = '~/R/export-TxGrowth.csv',sep=";")

head(ETH_price)

head(transaction_volume)

ETH_price$Date.UTC. <- as.Date(ETH_price$Date.UTC.,format = "%m/%d/%Y")

str(ETH_price) # verify the date format

transaction_volume$Date.UTC. <- as.Date(transaction_volume$Date.UTC.,format = "%m/%d/%Y") 

str(transaction_volume) # verify the date format

ggplot(ETH_price,aes(x = Date.UTC.,y = Value)) + 
  geom_point()+
  geom_line(aes(color="ETH_price")) +
  geom_line(data=transaction_volume,y = Value,color="transaction_volume")) +
  labs(color="Legend") +
  scale_colour_manual("",breaks = c("ETH_price","transaction_volume"),values = c("blue","brown")) +
  ggtitle("Correlation of ETH price and transaction volume") + 
  theme(plot.title = element_text(lineheight=.7,face="bold"))

发生以下错误:

Error in seq.int(0,to0 - from,by) : 'to' must be a finite number

数据看起来像这样(ETH_price):

> head(transaction_volume)

   Date.UTC. UnixTimeStamp Value
1 03.03.2017    1488499200 64294
2 04.03.2017    1488585600 58756
3 05.03.2017    1488672000 57031
4 06.03.2017    1488758400 57020
5 07.03.2017    1488844800 62589
6 08.03.2017    1488931200 55386

情节看起来像这样:

绘制两个.csv数据集的两个直方图,以比较R中的数据(ggplot)

有人知道什么地方可能出问题了吗?

我对每个提示都很满意!:)

MAiniak

/代码已更新

liuchun227728 回答:绘制两个.csv数据集的两个直方图,以比较R中的数据(ggplot)

总结解决问题的所有关键步骤。

1)您必须操纵日期格式才能被ggplot正确绘制。

2)由于您的ETH_price值和transaction_volume值的比例不同,为了将它们绘制在单个图形上,您必须使用@ r2evans在本文中描述的技巧:two y-axes with different scales for two datasets in ggplot2 [duplicate]。 / p>

因此,您的代码应如下所示:

# Here I re-created a small part of your dataset here just for the example
Date.UTC. = c("03.03.2017","04.03.2017","05.03.2017","06.03.2017","07.03.2017","08.03.2017")
Value = c(64294,58756,57031,57020,62589,55386)
transaction_volume = data.frame(Date.UTC.,Value)

Value = c(19.54,19.45,20.45,22.67,23.34,21.89)
ETH_price = data.frame(Date.UTC.,Value)

# Managing Date format
ETH_price$Date.UTC. = as.Date(ETH_price$Date.UTC.,format = "%m.%d.%Y")
transaction_volume$Date.UTC. = as.Date(transaction_volume$Date.UTC.,format = "%m.%d.%Y")
str(ETH_price) # to check the correct format of your dataset
str(transaction_volume) # to check the correct format of your dataset

# Merging dataset
ETH_price$z = "ETH_price"
transaction_volume$z = "transaction_volume"

# Defining the scale factor (you can adapt this part according your preferences for plotting)
scale_factor = mean(transaction_volume$Value / ETH_price$Value)
df_temp = within(transaction_volume,{Value = Value / scale_factor})
df = rbind(ETH_price,df_temp)
df

# Plotting both datasets
library(ggplot2)
mycolors = c("ETH_price" = "blue","transaction_volume" = "red")
ggplot(df,aes(x = Date.UTC.,y = Value,group = z,color = z)) +
  geom_path() +
  geom_line() +
  scale_y_continuous(name = "ETH_price",sec.axis = sec_axis(~scale_factor*.,name = "transaction_volume")) +
  scale_color_manual(name = "Datasets",values = mycolors) +
  theme(
    axis.title.y = element_text(color = mycolors["ETH_price"]),axis.text.y = element_text(color = mycolors["ETH_price"]),axis.title.y.right = element_text(color = mycolors["transaction_volume"]),axis.text.y.right = element_text(color = mycolors["transaction_volume"])
  )

因此,您应该得到以下图解: enter image description here

所以,我认为它应该可以解决您的问题;)

,

感谢您的答复!

我检查了数据集,发现有几行损坏的行被抛出。现在我有一个非常基本的问题(很抱歉,刚开始使用R),excel中的数据如下所示: Excel_data

如果我回到第一列,则日期消失了,因为该列没有日期格式,而是有一个随机数。我只是有导入到R的第一列中包含所有数据的数据集。我将尝试使用原始代码以及当前在R中看起来像这样的新数据:

    > head(transaction_volume)

   Date.UTC. UnixTimeStamp Value
1 03.03.2017    1488499200 64294
2 04.03.2017    1488585600 58756
3 05.03.2017    1488672000 57031
4 06.03.2017    1488758400 57020
5 07.03.2017    1488844800 62589
6 08.03.2017    1488931200 55386

我如何读取数据,以便R将以与数据位于.csv第一列时相同的方式进行识别?

很抱歉给您带来麻烦。

本文链接:https://www.f2er.com/3140965.html

大家都在问