由于bot,Ruby on Rails“UTF-8中的无效字节序列”

当我滚动我的网站的时候,我有一个由中文bot触发的错误: http : //www.easou.com/search/spider.html 。

我的应用程序的版本都是用Ruby 1.9.3和Rails 3.2.X

这里是一个堆栈跟踪:

An ArgumentError occurred in listings#show: invalid byte sequence in UTF-8 rack (1.4.5) lib/rack/utils.rb:104:in `normalize_params' ------------------------------- Request: ------------------------------- * URL : http://www.my-website.com * IP address: XXXX * Parameters: {"action"=>"show", "controller"=>"listings", "id"=>"location-t7-villeurbanne--58"} * Rails root: /.../releases/20140708150222 * Timestamp : 2014-07-09 02:57:43 +0200 ------------------------------- Backtrace: ------------------------------- rack (1.4.5) lib/rack/utils.rb:104:in `normalize_params' rack (1.4.5) lib/rack/utils.rb:96:in `block in parse_nested_query' rack (1.4.5) lib/rack/utils.rb:93:in `each' rack (1.4.5) lib/rack/utils.rb:93:in `parse_nested_query' rack (1.4.5) lib/rack/request.rb:332:in `parse_query' actionpack (3.2.18) lib/action_dispatch/http/request.rb:275:in `parse_query' rack (1.4.5) lib/rack/request.rb:209:in `POST' actionpack (3.2.18) lib/action_dispatch/http/request.rb:237:in `POST' actionpack (3.2.18) lib/action_dispatch/http/parameters.rb:10:in `parameters' ------------------------------- Session: ------------------------------- * session id: nil * data: {} ------------------------------- Environment: ------------------------------- * CONTENT_LENGTH : 514 * CONTENT_TYPE : application/x-www-form-urlencoded * HTTP_ACCEPT : text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1 * HTTP_ACCEPT_ENCODING : gzip, deflate * HTTP_ACCEPT_LANGUAGE : zh;q=0.9,en;q=0.8 * HTTP_CONNECTION : close * HTTP_HOST : www.my-website.com * HTTP_REFER : http://www.my-website.com/ * HTTP_USER_AGENT : Mozilla/5.0 (compatible; EasouSpider; +http://www.easou.com/search/spider.html) * ORIGINAL_FULLPATH : / * PASSENGER_APP_SPAWNER_IDLE_TIME : -1 * PASSENGER_APP_TYPE : rack * PASSENGER_CONNECT_PASSWORD : [FILTERED] * PASSENGER_DEBUGGER : false * PASSENGER_ENVIRONMENT : production * PASSENGER_FRAMEWORK_SPAWNER_IDLE_TIME : -1 * PASSENGER_FRIENDLY_ERROR_PAGES : true * PASSENGER_GROUP : * PASSENGER_MAX_REQUESTS : 0 * PASSENGER_MIN_INSTANCES : 1 * PASSENGER_SHOW_VERSION_IN_HEADER : true * PASSENGER_SPAWN_METHOD : smart-lv2 * PASSENGER_USER : * PASSENGER_USE_GLOBAL_QUEUE : true * PATH_INFO : / * QUERY_STRING : * REMOTE_ADDR : 183.60.212.153 * REMOTE_PORT : 52997 * REQUEST_METHOD : GET * REQUEST_URI : / * SCGI : 1 * SCRIPT_NAME : * SERVER_PORT : 80 * SERVER_PROTOCOL : HTTP/1.1 * SERVER_SOFTWARE : nginx/1.2.6 * UNION_STATION_SUPPORT : false * _ : _ * action_controller.instance : listings#show * action_dispatch.backtrace_cleaner : #<Rails::BacktraceCleaner:0x000000056e8660> * action_dispatch.cookies : #<ActionDispatch::Cookies::CookieJar:0x00000006564e28> * action_dispatch.logger : #<ActiveSupport::TaggedLogging:0x0000000318aff8> * action_dispatch.parameter_filter : [:password, /RAW_POST_DATA/, /RAW_POST_DATA/, /RAW_POST_DATA/] * action_dispatch.remote_ip : 183.60.212.153 * action_dispatch.request.content_type : application/x-www-form-urlencoded * action_dispatch.request.parameters : {"action"=>"show", "controller"=>"listings", "id"=>"location-t7-villeurbanne--58"} * action_dispatch.request.path_parameters : {:action=>"show", :controller=>"listings", :id=>"location-t7-villeurbanne--58"} * action_dispatch.request.query_parameters : {} * action_dispatch.request.request_parameters : {} * action_dispatch.request.unsigned_session_cookie: {} * action_dispatch.request_id : 9f8afbc8ff142f91ddbd9cabee3629f3 * action_dispatch.routes : #<ActionDispatch::Routing::RouteSet:0x0000000339f370> * action_dispatch.show_detailed_exceptions : false * action_dispatch.show_exceptions : true * rack-cache.allow_reload : false * rack-cache.allow_revalidate : false * rack-cache.cache_key : Rack::Cache::Key * rack-cache.default_ttl : 0 * rack-cache.entitystore : rails:/ * rack-cache.ignore_headers : ["Set-Cookie"] * rack-cache.metastore : rails:/ * rack-cache.private_headers : ["Authorization", "Cookie"] * rack-cache.storage : #<Rack::Cache::Storage:0x000000039c5768> * rack-cache.use_native_ttl : false * rack-cache.verbose : false * rack.errors : #<IO:0x000000006592a8> * rack.input : #<PhusionPassenger::Utils::RewindableInput:0x0000000655b3a0> * rack.multiprocess : true * rack.multithread : false * rack.request.cookie_hash : {} * rack.request.form_hash : * rack.request.form_input : #<PhusionPassenger::Utils::RewindableInput:0x0000000655b3a0> * rack.request.form_vars :    W "  陷q B  )     F  PZ  8   & G\y P  u T ed  . % mxEAẳ\ d* Hg   C賳 lj      U 1  ]pgt P  Ɗ   c"    LX  D   HR y  p`6 l   lN P  l S    `V4y  c    X2  &JO!  *p  l  - гU  w }g ԍk   (  FJ   q : 5G Jh pί    ࡃ]  z h     d } } * rack.request.query_hash : {} * rack.request.query_string : * rack.run_once : false * rack.session : {} * rack.session.options : {:path=>"/", :domain=>nil, :expire_after=>nil, :secure=>false, :httponly=>true, :defer=>false, :renew=>false, :coder=>#<Rack::Session::Cookie::Base64::Marshal:0x000000034d4ad8>, :id=>nil} * rack.url_scheme : http * rack.version : [1, 0] 

正如你所看到的,在url中没有无效的utf-8,但只在rack.request.form_vars 。 我每天有大约100个错误,并且与这一个相似。

所以,我试图强制utf-8在rack.request.form_vars像这样的东西:

 class RackFormVarsSanitizer def initialize(app) @app = app end def call(env) if env["rack.request.form_vars"] env["rack.request.form_vars"] = env["rack.request.form_vars"].force_encoding('UTF-8') end @app.call(env) end end 

我在我的application.rb调用它:

 config.middleware.use "RackFormVarsSanitizer" 

它似乎不工作,因为我已经有错误。 问题是我不能在开发模式下testing,因为我不知道如何设置rack.request.form_vars

我安装了utf8-cleanergem,但它没有修复。

有人有一个想法来解决这个问题? 或者在开发中触发它?

所以你不必在我的另一个答复中把评论拼凑在一起,这就是我现在正在做的 – 我已经看到24小时没有错误,所以看起来非常有前途:

将rack-utf8_sanitizer添加到您的Gemfile中:

 gem 'rack-utf8_sanitizer' 

并运行

 bundle 

把这个中间件放在app/middleware/handle_invalid_percent_encoding.rb ,并重命名类HandleInvalidPercentEncoding (因为ExceptionApp有点过于笼统)。

config/application.rbconfig块中执行:

 require "#{Rails.root}/app/middleware/handle_invalid_percent_encoding.rb" # NOTE: These must be in this order relative to each other. # HandleInvalidPercentEncoding just raises for encoding errors it doesn't cover, # so it must run after (= be inserted before) Rack::UTF8Sanitizer. config.middleware.insert 0, HandleInvalidPercentEncoding config.middleware.insert 0, Rack::UTF8Sanitizer # from a gem 

部署。 完成。

app恰好是我正在工作的项目中的中间件的位置,但是我可能更喜欢lib ,不pipe怎样,要么都应该工作)。

将此行添加到您的Gemfile ,然后在terminal中运行bundle

 gem "handle_invalid_percent_encoding_requests" 

这个解决scheme是基于Henrik的回答 ,变成了一个Rails Engine的gem 。

在gem回购中存在一个与某人可能的解决scheme的链接的问题 – 他们说这对他们有效,但他们不确定是否是一个好的解决scheme。

我还没有尝试,但我想我会的。